What is Speech Synthesis Markup Language (SSML)? How to use it in text-to-speech software and the list of major code.

June 21, 2025

Hello, welcome to Ondoku website.

Today, we would like to introduce SSML.

You may find it difficult since there are some technical terms.

However, if you know it, it will make a huge difference in the way you use Ondoku.

Please read it through as we try to make it easy to understand.

What is SSML?

First of all, SSML stands for Speech Synthesis Markup Language.

It is similar to HTML.

By writing this SSML code, you can further control the speech of Ondoku.

How to use SSML with Ondoku

It is very easy to use SSML in Ondoku.

Please directly enter the SSML code into Ondoku text box.

The SSML will be applied automatically.

Be sure to include

<speak></speak>

at the beginning and at the end of code! Without the code <speak>, SSML will not be applied.

<speak></speak>

As mentioned earlier, this is the code to activate SSML.

Be sure to include this code at the beginning and end of the text you want to convert to speech.

Example

<speak>Enter the text you want to convert from text to speech here</speak>

<break time="○○ms"/>

As the name implies, the break time code is a code for making a pause.

Enter the code where you want to insert a break time, and replace the ○○ with the number you like.

The speech generated by Ondoku will not have "pause" unless specified.

You can make some break time with brackets or punctuations, but these pauses may not be set as you want.

Especially if you need break time of more than 2 seconds, it will be more difficult to control.

However, if you enter

<break time="1000ms"/>

to the place where you want to put a pause,

<speak>
I want some break time <break time="1000ms"/> when reading this sentence.
</speak>

you will hear some break in the middle of the sentence.

* 1000 ms = 1 second

You may change the number before ms or s freely in order to control the length of the pause.

200ms = 0.200 seconds
500ms = 0.500 seconds
1000ms = 1.000 seconds
2000ms = 2.000 seconds
3s = 3 seconds
10s = 10 seconds etc.

<say-as interpret-as="expletive">○○</say-as>

With this code, the text in the ○○ will be played with a beep sound for bleep censor.

It's a kind of playful code. The text enclosed in this code will be replaced with beeping sound.

<speak>
This word is <say-as interpret-as="expletive">restricted</say-as>
</speak>

<sub alias="◇◇">○○</sub>

It is a code that allows you to indicate pronunciation.

Sometimes when you use text-to-speech conversion, you may hear a word spoken in an unexpected way.

For example, you can make the software pronounce the word "一行," one line, as "Ichigyo", whereas the system pronounce it as "Ichiyuki".

For the code, enter the kanji for ○○ and the pronunciation for ◇◇.

<speak>
Correctly pronounce as <sub alias="Ichigyo">一行</sub> instead of Ichiyuki
</speak>

<emphasis>○○</emphasis>

You can emphasize the text enclosed in the code in the speech.

<speak>
This code will <emphasis>emphasize text</emphasis>
</speak>

<prosody>○○</prosody>

Prosody is a linguistic term that describes the rules of natural spoken English, including

Intonation (rising and falling pitch)
Position of pause
Length of sound and stress, etc.

With these codes, you can adjust the following 3 elements of prosody:

Rate (rate, speed)
Code: "x-slow" "slow" "medium" "fast" "x-fast" "default"
Pitch (pitch, height)
Code: "x-low" "low" "medium" "high" "x-high" "default"
Volume (volume, size)
Code: "silent" "x-soft" "soft" "medium" "loud" "x-loud" "default"

<speak>
<prosody rate="fast">Speak fast. </prosody>
<prosody pitch="high">Speak in a high pitch voice. </prosody>
<prosody volume="loud">Speak loudly. </prosody>
<prosody rate="slow" pitch="x-low">Speak slowly and in a low pitch voice. </prosody>
<prosody rate="fast" pitch="high" volume="medium">Speak fast, in a high pitch voice, and in normal volume. </prosody>
</speak>

■ AI voice synthesis software "Ondoku"

"Ondoku" is an online text-to-speech tool that can be used with no initial costs.

Supports approximately 50 languages, including Japanese, English, Chinese, Korean, Spanish, French, and German
Available from both PC and smartphone
Suitable for business, education, entertainment, etc.
No installation required, can be used immediately from your browser
Supports reading from images

To use it, simply enter text or upload a file on the site. A natural-sounding audio file will be generated within seconds. You can use voice synthesis up to 5,000 characters for free, so please give it a try.

Convert text to speech now

Text-to-speech software "Ondoku" can read out 5000 characters every month with AI voice for free. You can easily download MP3s and commercial use is also possible. If you sign up for free, you can convert up to 5,000 characters per month for free from text to speech. Try Ondoku now.

HP: ondoku3.com
Email: ondoku3.com@gmail.com

←Previous post | Next post→

Let's improve customer satisfaction by using case examples of reading aloud software

New function released: now letters on image can be read! Text-to-speech software Ondoku

[Request response to Ondoku] Please do not count the number of characters when corrected.

Ondoku's error "Sentences with emoji cannot be read"

How to use the text-to-speech software Ondoku, member registration method and convenient functions

Common errors and workarounds when using Speech Markup Language (SSML)

Ondoku

"Ondoku" is a Text-to-Speech service that anyone can use for free without installation. If you register for free, you can get up to 5000 characters for free each month. Register now for free

New Posts