Khmer speakers prefer voice. Now the API reads back.
សម្រាប់មនុស្ស ១៧ លាននាក់ សម្លេងគឺជាក្ដារចុច។ ឥឡូវនេះ វាមិនត្រឹមតែស្ដាប់ទេ — វាឆ្លើយតបមកវិញ។
Two years ago we made a bet about Cambodia: that for a country whose alphabet runs 74 characters long, voice is not a feature — it is the interface. People don't type Khmer on a phone; the layout is a foreign country every time you reach for it. They speak. So we built a speech-to-text API that could finally hear Khmer the way English models have been heard for a decade, and we opened it up to anyone with a keyboard and a curl command.
But a conversation runs both ways. An API that only listens is half a loop. The doctor who dictates a clinic note wants it read back for confirmation. The delivery app that takes a spoken instruction wants to speak the next one out loud. The e-learning platform that transcribes a lesson wants to voice the answer key. For all of them the missing half was the same: there was no way to turn Khmer text back into natural Khmer speech. The voice was the keyboard, and the keyboard had no reply.
Today it does. Khmer text-to-speech is live, and we're opening it up.
What we shipped
Two voices ship on day one. Sovann and Puthi are human-like, natural Khmer voices — not the flat, robotic read-out that has passed for Cambodian speech synthesis until now, but voices with the cadence and warmth of a person actually reading to you. Sovann is the default; Puthi is a tap away. Both speak Khmer and English, so a mixed sentence — a product name here, a number there — comes out sounding like one speaker, not two systems stitched together.
The API is the one you already know how to call. Text-to-speech lives at POST /api/v1/tts, and it is shaped like everything else you've used from the modern model providers. Send JSON — { text, voice } — with the same ds_sk_ bearer key you already use for transcription, and you get back an audio/mpeg stream: a plain MP3 you can pipe to a file, hand to an <audio> element, or drop into a phone tree. The response carries an x-audio-duration-seconds header so you know exactly how long the clip runs before you play a byte of it. Input is capped at 1200 characters per request — long enough for a paragraph, short enough to keep latency honest — and the same familiar error envelope tells you when you've gone over.
Billing stays simple because it stays unified. Speech-to-text and text-to-speech draw from a single audio-minutes pool. The minutes you spend transcribing an interview and the minutes you spend voicing a reply come out of the same meter, on the same plan, under the same key. There is no second product to price, no separate subscription to reconcile, no new dashboard to learn. If you already meter STT, you already meter TTS.
You can try it before you sign anything. The playground gives you three free synthesis runs with no account — type Khmer, pick a voice, press play, hear it. When you're ready, an account turns those three tries into a real free tier with a monthly quota, and the paid plans scale up from there on the same wallet as your transcription usage.
This matters because nobody else does it. ElevenLabs does not ship a production Khmer voice. Google's and Azure's text-to-speech list Khmer in name, but anyone who has actually fed them real Cambodian text knows how far the output falls from something you'd put in front of a customer. The honest state of the market is that there has been no production-grade Khmer text-to-speech — until now. We are not trying to out-scale the giants at generic multilingual synthesis; we are trying to be the one API that treats Khmer as a first-class language instead of a checkbox.
And it unlocks the things a voice is actually for. Interactive voice response systems that answer a caller in real Khmer instead of a pre-recorded loop. Accessibility narration that reads a page aloud for someone who can't — or won't — squint at 74 characters on a small screen. E-learning that voices a lesson so a student in a village with one shared phone can listen instead of read. These aren't hypotheticals we're hoping someone builds; they're the reasons developers have been asking us for this since the day speech-to-text went live.
Try it
Try Khmer text-to-speech — type a Khmer sentence, pick Sovann or Puthi, and hear it read back. No signup, three free runs.
See pricing — one audio-minutes pool covers both speech-to-text and text-to-speech, from the free tier up.
Read the TTS quickstart — synthesize your first clip from curl, Python, or Node in about five minutes.
Subscribe to the changelog — we ship in milestone batches; text-to-speech is the newest one.
Subscribe by email — one email per release, no other traffic, unsubscribe any time.
Why we did this
We're a small team, and our goal has never changed: be the API for Khmer voice — end to end. Not the biggest speech company, not the one with the deepest model budget, but the one that a developer in Phnom Penh can reach for and trust to handle their own language properly, in both directions. Speech-to-text let software finally hear Cambodia. Text-to-speech lets it answer. Every plan tier, every endpoint, every line of documentation is calibrated against one question: does this make it easier for someone here to ship Khmer voice in their app? Reading back was the missing half of that answer. Now it's yours to build with.
សម្លេងគឺជាក្ដារចុច — ឥឡូវវាឆ្លើយតបមកវិញ។ The voice IS the keyboard — now it answers back.
— The Doslarb team