What is Speech Synthesis Markup Language (SSML)? How to use it in text-to-speech software and the list of major code.

Feb. 7, 2023

What is Speech Synthesis Markup Language (SSML)? How to use it in text-to-speech software and the list of major code.


Hello, welcome to Ondoku website. 

Today, we would like to introduce SSML.

You may find it difficult since there are some technical terms.

However, if you know it, it will make a huge difference in the way you use Ondoku.

Please read it through as we try to make it easy to understand.

What is SSML?

First of all, SSML stands for Speech Synthesis Markup Language. 

It is similar to HTML.

By writing this SSML code, you can further control the speech of Ondoku. 

How to use SSML with Ondoku

It is very easy to use SSML in Ondoku.

Please directly enter the SSML code into Ondoku text box.

The SSML will be applied automatically.

Be sure to include

<speak></speak>

at the beginning and at the end of code! Without the code <speak>, SSML will not be applied.

<speak></speak>

As mentioned earlier, this is the code to activate SSML.

Be sure to include this code at the beginning and end of the text you want to convert to speech.

Example

<speak>Enter the text you want to convert from text to speech here</speak>

<break time="○○ms"/>

As the name implies, the break time code is a code for making a pause.

Enter the code where you want to insert a break time, and replace the ○○ with the number you like.

The speech generated by Ondoku will not have "pause" unless specified. 

You can make some break time with brackets or punctuations, but these pauses may not be set as you want. 

Especially if you need break time of more than 2 seconds, it will be more difficult to control.

However, if you enter

<break time="1000ms"/>

to the place where you want to put a pause,

<speak>
I want some break time <break time="1000ms"/> when reading this sentence.
</speak>

you will hear some break in the middle of the sentence. 

* 1000 ms = 1 second

You may change the number before ms or s freely in order to control the length of the pause. 

200ms = 0.200 seconds
500ms = 0.500 seconds
1000ms = 1.000 seconds
2000ms = 2.000 seconds
3s = 3 seconds
10s = 10 seconds etc.

<say-as interpret-as="expletive">○○</say-as>

With this code, the text in the ○○ will be played with a beep sound for bleep censor. 

It's a kind of playful code. The text enclosed in this code will be replaced with beeping sound.

<speak>
This word is <say-as interpret-as="expletive">restricted</say-as>
</speak>

<sub alias="◇◇">○○</sub>

It is a code that allows you to indicate pronunciation.

Sometimes when you use text-to-speech conversion, you may hear a word spoken in an unexpected way.

For example, you can make the software pronounce the word "一行," one line, as "Ichigyo", whereas the system pronounce it as "Ichiyuki".

For the code, enter the kanji for ○○ and the pronunciation for ◇◇.

<speak>
Correctly pronounce as <sub alias="Ichigyo">一行</sub> instead of Ichiyuki
</speak>

<emphasis>○○</emphasis>

You can emphasize the text enclosed in the code in the speech.

<speak>
This code will <emphasis>emphasize text</emphasis>
</speak>

<prosody>○○</prosody>

Prosody is a linguistic term that describes the rules of natural spoken English, including

  • Intonation (rising and falling pitch)
  • Position of pause
  • Length of sound and stress, etc.

With these codes, you can adjust the following 3 elements of prosody:

  • Rate (rate, speed)
    Code: "x-slow" "slow" "medium" "fast" "x-fast" "default"
  • Pitch (pitch, height)
    Code: "x-low" "low" "medium" "high" "x-high" "default"
  • Volume (volume, size)
    Code: "silent" "x-soft" "soft" "medium" "loud" "x-loud" "default"

<speak>
<prosody rate="fast">Speak fast. </prosody>
<prosody pitch="high">Speak in a high pitch voice. </prosody>
<prosody volume="loud">Speak loudly. </prosody>
<prosody rate="slow" pitch="x-low">Speak slowly and in a low pitch voice. </prosody>
<prosody rate="fast" pitch="high" volume="medium">Speak fast, in a high pitch voice, and in normal volume. </prosody>
</speak>

Text-to-speech software "Ondoku" can read out 5000 characters every month with AI voice for free. You can easily download MP3s and commercial use is also possible. If you sign up for free, you can convert up to 5,000 characters per month for free from text to speech. Try Ondoku now.
HP: ondoku3.com
Email: ondoku3.com@gmail.com
Related posts