What is Speech Synthesis Markup Language (SSML)? How to use it with text-to-speech software and a list of main codes.

Jan. 19, 2026

What is Speech Synthesis Markup Language (SSML)? How to use it with text-to-speech software and a list of main codes.

Hello, thank you for always using Ondoku.

In this article, we will introduce SSML.

You might find it a bit difficult as some technical terms will appear, but by learning how to use SSML, you will be able to utilize Ondoku even more conveniently.

We will explain it in an easy-to-understand way, so please take a look.

What is SSML?

SSML is a type of speech markup language.

A "markup language" is a language that defines the structure (role) for each part of a text, similar to HTML.

By writing SSML code, you can control Ondoku's speech more conveniently.

Currently, the SSML codes supported in all languages are:

Only these two types of codes are supported. Please note that other codes cannot be used depending on the language or voice type.

How to use SSML with Ondoku

Using SSML with Ondoku is very easy.

Please enter the SSML code directly into the Ondoku text box.

The SSML will then be applied automatically.

You can use SSML simply by inserting

at the beginning and end of your text.

Please make sure not to forget this code! If the code is not included, SSML will not be applied.

is a necessary code to enable SSML.

Be sure to put this code at the beginning and end of the text you want to read aloud.

Example

Please enter the text you want to read aloud here

The break time code is, as the name suggests, a code for creating pauses.

Enter the code where you want to insert a pause, and replace the ○○ part with your preferred number.

When reading aloud normally with Ondoku, pauses may not always occur with the timing you expect.

This code is useful in such situations.

For example, let's try putting a code like

where you want to insert a pause.


I want to leave a small pause when reading this sentence.

You can hear that it takes a breath at the position where the code was inserted before continuing to read.

※ 1000ms = 1 second

By changing the number before ms or s, you can freely change the length of the "pause."

  • 200ms = 0.200 seconds
  • 500ms = 0.500 seconds
  • 1000ms = 1.000 second
  • 2000ms = 2.000 seconds
  • 3s = 3 seconds, etc.

Notes on using

The behavior when inserting a tag at the very beginning of the text varies depending on the voice, and operation cannot be guaranteed.

Example: I want to create a 5-second pause at the beginning

↑ This type of usage is not possible due to specifications.

Also, the maximum length for a pause is 10 seconds (10000ms, 10s).

If a longer time is set, it will be rounded to 10 seconds.

The usage of the tag is also explained in this article.

〇〇

This tag is for specifying the language for each part of the text when using Ondoku's Multilingual voices for multi-language reading aloud.

It is used when the pronunciation is incorrect or when many languages are used within a sentence and cannot be distinguished.


The Japanese word for "Hello" is Hello in English,
Bonjour in French,
Guten Tag in German,
and 你好 in Chinese.

The usage of the tag is explained in detail in this article.

○○

The text in the ○○ part will be played with a "beep" censored sound.

This is a code with a bit of a playful spirit. The characters in the part surrounded by this code will be replaced by a beep sound.


This word is censored.

○○

When reading the alphabet, it is spelled out letter by letter.


The spelling of hello is Hello

However, some voices may cause an error with this SSML. Please be aware of this before using.

Japanese: Nanami

English (USA): en-US-A

○○

This is a code that allows you to provide phonetic readings.

In text-to-speech, unexpected readings can sometimes occur.

For example, if "一行" (one line) is pronounced as "ichiyuki," providing the phonetic reading "ichigyo" ensures it is pronounced correctly.

In the code, enter the Kanji in ○○ and the phonetic reading in ◇◇.


Read correctly as 一行 rather than "ichiyuki"

○○

You can emphasize the characters surrounded by the code during reading aloud.


This code can emphasize words.

○○

Prosody is a general term for the sound characteristics of a language when speaking naturally, such as:

  • Rising and falling of sounds
  • Pause positions
  • Length and stress, etc.

The three things you can adjust with this code are:

  • rate (speed)
    Adjustment codes: "x-slow", "slow", "medium", "fast", "x-fast", "default"
  • pitch (height)
    Adjustment codes: "x-low", "low", "medium", "high", "x-high", "default"
  • volume (loudness)
    Adjustment codes: "silent", "x-soft", "soft", "medium", "loud", "x-loud", "default"


I will speak quickly.
I will speak in a high voice.
I will speak loudly.
I will speak slowly and in a low voice.
I will speak quickly, in a high voice, and at normal volume.

Why not utilize SSML to make Ondoku even more convenient?

As shown, by utilizing SSML, you can use Ondoku even more conveniently!

Why not further expand the use of Ondoku's realistic and easy-to-hear voices for video production, presentations, and more?

■ AI voice synthesis software "Ondoku"

"Ondoku" is an online text-to-speech tool that can be used with no initial costs.

  • Supports approximately 50 languages, including Japanese, English, Chinese, Korean, Spanish, French, and German
  • Available from both PC and smartphone
  • Suitable for business, education, entertainment, etc.
  • No installation required, can be used immediately from your browser
  • Supports reading from images

To use it, simply enter text or upload a file on the site. A natural-sounding audio file will be generated within seconds. You can use voice synthesis up to 5,000 characters for free, so please give it a try.

Text-to-speech software "Ondoku" can read out 5000 characters every month with AI voice for free. You can easily download MP3s and commercial use is also possible. If you sign up for free, you can convert up to 5,000 characters per month for free from text to speech. Try Ondoku now.
HP: ondoku3.com
Email: ondoku3.com@gmail.com
Related posts

"Ondoku" is a Text-to-Speech service that anyone can use for free without installation. If you register for free, you can get up to 5000 characters for free each month. Register now for free