What is Speech Synthesis Markup Language (SSML)? How to use it with text-to-speech software and a list of main codes.
Jan. 19, 2026
Hello, thank you for always using Ondoku.
In this article, we will introduce SSML.
You might find it a bit difficult as some technical terms will appear, but by learning how to use SSML, you will be able to utilize Ondoku even more conveniently.
We will explain it in an easy-to-understand way, so please take a look.
What is SSML?
SSML is a type of speech markup language.
A "markup language" is a language that defines the structure (role) for each part of a text, similar to HTML.
By writing SSML code, you can control Ondoku's speech more conveniently.
Currently, the SSML codes supported in all languages are:
Only these two types of codes are supported. Please note that other codes cannot be used depending on the language or voice type.
How to use SSML with Ondoku
Using SSML with Ondoku is very easy.
Please enter the SSML code directly into the Ondoku text box.
The SSML will then be applied automatically.
You can use SSML simply by inserting
at the beginning and end of your text.
Please make sure not to forget this code! If the
Be sure to put this code at the beginning and end of the text you want to read aloud.
Example
Please enter the text you want to read aloud here
The break time code is, as the name suggests, a code for creating pauses.
Enter the code where you want to insert a pause, and replace the ○○ part with your preferred number.
When reading aloud normally with Ondoku, pauses may not always occur with the timing you expect.
This code is useful in such situations.
For example, let's try putting a code like
where you want to insert a pause.
I want to leave a smallpause when reading this sentence.
You can hear that it takes a breath at the position where the code was inserted before continuing to read.
※ 1000ms = 1 second
By changing the number before ms or s, you can freely change the length of the "pause."
- 200ms = 0.200 seconds
- 500ms = 0.500 seconds
- 1000ms = 1.000 second
- 2000ms = 2.000 seconds
- 3s = 3 seconds, etc.
Notes on using
The behavior when inserting a
Example:
I want to create a 5-second pause at the beginning
↑ This type of usage is not possible due to specifications.
Also, the maximum length for a pause is 10 seconds (10000ms, 10s).
If a longer time is set, it will be rounded to 10 seconds.
The usage of the
〇〇
This tag is for specifying the language for each part of the text when using Ondoku's Multilingual voices for multi-language reading aloud.
It is used when the pronunciation is incorrect or when many languages are used within a sentence and cannot be distinguished.
The Japanese word for "Hello" isHello in English,
Bonjour in French,
Guten Tag in German,
and你好 in Chinese.
The usage of the
○○
The text in the ○○ part will be played with a "beep" censored sound.
This is a code with a bit of a playful spirit. The characters in the part surrounded by this code will be replaced by a beep sound.
This word iscensored .
○○
When reading the alphabet, it is spelled out letter by letter.
The spelling of hello isHello
However, some voices may cause an error with this SSML. Please be aware of this before using.
Japanese: Nanami
English (USA): en-US-A
○○
This is a code that allows you to provide phonetic readings.
In text-to-speech, unexpected readings can sometimes occur.
For example, if "一行" (one line) is pronounced as "ichiyuki," providing the phonetic reading "ichigyo" ensures it is pronounced correctly.
In the code, enter the Kanji in ○○ and the phonetic reading in ◇◇.
Read correctly as 一行 rather than "ichiyuki"
○○
You can emphasize the characters surrounded by the code during reading aloud.
This code canemphasize words.
○○
Prosody is a general term for the sound characteristics of a language when speaking naturally, such as:
- Rising and falling of sounds
- Pause positions
- Length and stress, etc.
The three things you can adjust with this code are:
- rate (speed)
Adjustment codes: "x-slow", "slow", "medium", "fast", "x-fast", "default" - pitch (height)
Adjustment codes: "x-low", "low", "medium", "high", "x-high", "default" - volume (loudness)
Adjustment codes: "silent", "x-soft", "soft", "medium", "loud", "x-loud", "default"
I will speak quickly.
I will speak in a high voice.
I will speak loudly.
I will speak slowly and in a low voice.
I will speak quickly, in a high voice, and at normal volume.
Why not utilize SSML to make Ondoku even more convenient?
As shown, by utilizing SSML, you can use Ondoku even more conveniently!
Why not further expand the use of Ondoku's realistic and easy-to-hear voices for video production, presentations, and more?
■ AI voice synthesis software "Ondoku"
"Ondoku" is an online text-to-speech tool that can be used with no initial costs.
- Supports approximately 50 languages, including Japanese, English, Chinese, Korean, Spanish, French, and German
- Available from both PC and smartphone
- Suitable for business, education, entertainment, etc.
- No installation required, can be used immediately from your browser
- Supports reading from images
To use it, simply enter text or upload a file on the site. A natural-sounding audio file will be generated within seconds. You can use voice synthesis up to 5,000 characters for free, so please give it a try.
Email: ondoku3.com@gmail.com
"Ondoku" is a Text-to-Speech service that anyone can use for free without installation. If you register for free, you can get up to 5000 characters for free each month. Register now for free