What is Speech Synthesis Markup Language (SSML)? How to use it with text-to-speech software and a list of main codes.

April 1, 2026

Hello, thank you for always using Ondoku.

In this article, we will introduce SSML.

You might find it a bit difficult as some technical terms will appear, but by learning how to use SSML, you will be able to utilize Ondoku even more conveniently.

We will explain it in an easy-to-understand way, so please take a look.

What is SSML?

SSML is a type of speech markup language.

A "markup language" is a language that defines the structure (role) for each part of a text, similar to HTML.

By writing SSML code, you can control Ondoku's speech more conveniently.

Currently, the SSML codes supported in all languages are:

Only these two types of codes are supported. Please note that other codes cannot be used depending on the language or voice type.

How to use SSML with Ondoku

Using SSML with Ondoku is very easy.

Please enter the SSML code directly into the Ondoku text box.

The SSML will then be applied automatically.

You can use SSML simply by inserting

at the beginning and end of your text.

Please make sure not to forget this code! If the code is not included, SSML will not be applied.

is a necessary code to enable SSML.

Be sure to put this code at the beginning and end of the text you want to read aloud.

Example

Please enter the text you want to read aloud here

The break time code is, as the name suggests, a code for creating pauses.

Enter the code where you want to insert a pause, and replace the ○○ part with your preferred number.

When reading aloud normally with Ondoku, pauses may not always occur with the timing you expect.

This code is useful in such situations.

For example, let's try putting a code like

where you want to insert a pause.

I want to leave a small pause when reading this sentence.

You can hear that it takes a breath at the position where the code was inserted before continuing to read.

※ 1000ms = 1 second

By changing the number before ms or s, you can freely change the length of the "pause."

200ms = 0.200 seconds
500ms = 0.500 seconds
1000ms = 1.000 second
2000ms = 2.000 seconds
3s = 3 seconds, etc.

Notes on using

The behavior when inserting a tag at the very beginning of the text varies depending on the voice, and operation cannot be guaranteed.

Example: I want to create a 5-second pause at the beginning

↑ This type of usage is not possible due to specifications.

Also, the maximum length for a pause is 10 seconds (10000ms, 10s).

If a longer time is set, it will be rounded to 10 seconds.

The usage of the tag is also explained in this article.

How to adjust pauses and blank time in Ondoku's reading aloud [2 types] | Text-to-speech software Ondoku

One of the needs of Ondoku users is "to open up the pauses a little more." If you want to adjust "pauses," there are two types of adjustment methods: 1. Punctuation 2. SSML.

〇〇

This tag is for specifying the language for each part of the text when using Ondoku's Multilingual voices for multi-language reading aloud.

It is used when the pronunciation is incorrect or when many languages are used within a sentence and cannot be distinguished.

The Japanese word for "Hello" is Hello in English,
Bonjour in French,
Guten Tag in German,
and 你好 in Chinese.

The usage of the tag is explained in detail in this article.

How to use SSML tags with Ondoku's multilingual reading aloud? How to use the <lang> tag for multilingual voices | Text-to-speech software Ondoku

How to use SSML tags with Ondoku's multilingual reading aloud? How to use the tag for multilingual voices | Text-to-speech software Ondoku

Explains how to use SSML tags with Ondoku's multilingual function. Includes templates you can copy and paste. Perfect for YouTube videos and language teaching materials!

○○

The text in the ○○ part will be played with a "beep" censored sound.

This is a code with a bit of a playful spirit. The characters in the part surrounded by this code will be replaced by a beep sound.

This word is censored.

○○

When reading the alphabet, it is spelled out letter by letter.

The spelling of hello is Hello

However, some voices may cause an error with this SSML. Please be aware of this before using.

Japanese: Nanami

English (USA): en-US-A

_○○

This is a code that allows you to provide phonetic readings.

In text-to-speech, unexpected readings can sometimes occur.

For example, if "一行" (one line) is pronounced as "ichiyuki," providing the phonetic reading "ichigyo" ensures it is pronounced correctly.

In the code, enter the Kanji in ○○ and the phonetic reading in ◇◇.

Read correctly as _一行 rather than "ichiyuki"

○○

You can emphasize the characters surrounded by the code during reading aloud.

This code can emphasize words.

○○

Prosody is a general term for the sound characteristics of a language when speaking naturally, such as:

Rising and falling of sounds
Pause positions
Length and stress, etc.

The three things you can adjust with this code are:

rate (speed)
Adjustment codes: "x-slow", "slow", "medium", "fast", "x-fast", "default"
pitch (height)
Adjustment codes: "x-low", "low", "medium", "high", "x-high", "default"
volume (loudness)
Adjustment codes: "silent", "x-soft", "soft", "medium", "loud", "x-loud", "default"

I will speak quickly.
I will speak in a high voice.
I will speak loudly.
I will speak slowly and in a low voice.
I will speak quickly, in a high voice, and at normal volume.

Why not utilize SSML to make Ondoku even more convenient?

As shown, by utilizing SSML, you can use Ondoku even more conveniently!

Why not further expand the use of Ondoku's realistic and easy-to-hear voices for video production, presentations, and more?

■ AI voice synthesis software "Ondoku"

"Ondoku" is an online text-to-speech tool that can be used with no initial costs.

Supports approximately 50 languages, including Japanese, English, Chinese, Korean, Spanish, French, and German
Available from both PC and smartphone
Suitable for business, education, entertainment, etc.
No installation required, can be used immediately from your browser
Supports reading from images

To use it, simply enter text or upload a file on the site. A natural-sounding audio file will be generated within seconds. You can use voice synthesis up to 5,000 characters for free, so please give it a try.

Text-to-speech software "Ondoku" can read out 5000 characters every month with AI voice for free. You can easily download MP3s and commercial use is also possible. If you sign up for free, you can convert up to 5,000 characters per month for free from text to speech. Try Ondoku now.

HP: ondoku3.com
Email: ondoku3.com@gmail.com

←Previous post | Next post→

Common Errors and Workarounds When Using Speech Synthesis Markup Language (SSML)

Ondokutext-to-speech: List of Supported Languages for Reading Aloud

[Convert Images to Speech for Free] How to Use the Feature to Read Text from Images and Read it Alo…

Introducing use cases for text-to-speech software. Improve customer satisfaction using text-to-spee…

How to use the text-to-speech software Ondoku, registration method, and useful features

Ondoku error: "Sentences containing emojis cannot be read aloud"

Ondoku

"Ondoku" is a Text-to-Speech service that anyone can use for free without installation. If you register for free, you can get up to 5000 characters for free each month. Register now for free

New Posts