What is Irodori-TTS? Features, Precautions, and Setup Guide Explained

July 28, 2026

What kind of AI text-to-speech software is Irodori-TTS?

Many of you may be wondering about the new AI text-to-speech software "Irodori-TTS."

In this article, we will explain the features, capabilities, precautions, and usage of Irodori-TTS in an easy-to-understand manner.

Furthermore, for those who feel that "the setup seems difficult," we also introduce a speech synthesis method that can be used immediately without installation.

What you will learn in this article

What kind of software is Irodori-TTS?
Capabilities and precautions of Irodori-TTS
How to use Irodori-TTS (from setup to voice adjustment)
Recommended methods when environment construction is difficult

What is Irodori-TTS? Explaining the Japanese AI Speech Synthesis Software

First, we will briefly explain what kind of AI speech synthesis software Irodori-TTS is and its features.

Irodori-TTS is an AI Speech Synthesis Model that Runs Locally

Irodori-TTS is an AI speech synthesis software specialized for Japanese.

The developer is Aratako, and it is released for free as open source (MIT license).

The biggest feature is that it can perform "local operation," where speech synthesis is completed entirely on your own PC.

Since all voice generation processing takes place on your local PC, text and generated voice data are never sent to external servers.

After the initial setup, you can generate speech without an internet connection, and there are no limits on the number of generations.

However, the setup requires programming tools such as Python and Git.

Additionally, a high-performance PC equipped with a GPU (graphics card) is recommended for high-speed operation.

What You Can and Cannot Do with Irodori-TTS

Next, we will explain what Irodori-TTS can and cannot do.

What Irodori-TTS Can Do

Since Irodori-TTS runs in a local environment, you can generate speech an unlimited number of times.

Even in an environment without an internet connection, you can freely create speech once the initial setup is complete.

There are multiple ways to instruct what kind of voice to create; using the caption function, you can create your preferred voice quality just through text instructions.

It is also possible to reproduce an existing voice through voice cloning or add emotional expressions using emojis.

Since it uses the MIT license, commercial use of the generated audio is also possible.

Precautions for Irodori-TTS

On the other hand, there are some precautions you should know before using Irodori-TTS.

Each generated audio is limited to about 30 seconds

The amount of text that can be read in a single generation is limited to about 30 seconds.

If you want to read a long text, you need to split the text and generate it multiple times.

It is difficult to achieve exactly the voice or speaking style you want

While Irodori-TTS offers high freedom, it does not provide default voices (base voices).

Therefore, unless you specify captions or reference audio, the gender and age will change randomly with each generation.

To read with the same voice, you need to load a reference audio file.

Additionally, there is no function to manually adjust intonation or inflection.

Supported language is Japanese only

The supported language is only Japanese; it does not support foreign languages such as English.

Also, note that misreadings of Kanji can occur.

High-spec PC with GPU recommended

Depending on your PC specs, voice generation may take time.

On a PC without a GPU, it takes about one minute to generate even a short sentence.

On entry-class CPUs like Celeron or N100, practical use would be difficult.

How to Use Irodori-TTS (Setup Workflow)

Here, we will briefly explain how to use Irodori-TTS.

The overall setup flow is as follows:

Install necessary software
Create a working folder
Clone Irodori-TTS from GitHub
Install necessary packages
Launch Irodori-TTS
Load the AI model
Read the text aloud

1. Install Software Necessary for Irodori-TTS

Preparation is required to set up Irodori-TTS.

First, install these three types of software:

Python 3.10 or higher: Programming language
Git: Version control system (Required to download Irodori-TTS)
uv: Package manager for Python

To install Python, Git, and uv, first right-click the Start menu and click "Terminal" (you do not need to launch as an administrator).

The terminal (PowerShell) screen will then open.

In this screen, enter and execute the following commands:

winget install --id Git.Git -e
winget install --id=astral-sh.uv -e

You have now installed what is necessary to set up Irodori-TTS.

*Python is managed by uv, so it will be installed automatically during setup.

Once installed, close and reopen the terminal (PowerShell) once (to "set the path").

2. Create a Working Folder

Next, create a working folder.

Irodori-TTS will be installed here.

In this example, we created a folder named "irodori-tts" directly under the C drive.

After creating the folder, move to that folder in the terminal.

cd C:\irodori-tts

3. Clone Irodori-TTS from GitHub

Enter the following command in the terminal to clone the Irodori-TTS repository from GitHub.

git clone https://github.com/Aratako/Irodori-TTS.git

Cloning the repository will finish in a few seconds.

Enter the following command to move to the folder of the cloned repository.

cd Irodori-TTS

4. Install Necessary Packages

Enter and execute the following command to install the packages required to run Irodori-TTS.

uv sync

It will take some time as it downloads and installs a large number of packages.

The Python core will also be installed here.

While downloading and installing, wait without closing the terminal screen.

Since it downloads files close to 3GB in size, it is recommended to set this up in a location with a good internet connection.

5. Launch Irodori-TTS

Once the package download and installation are finished, setup is complete.

Launch Irodori-TTS.

Enter and execute the following command and wait a moment for it to start.

uv run python gradio_app.py --server-name 0.0.0.0 --server-port 7860

When the following is displayed in the terminal, the launch is complete.

Running on local URL: http://0.0.0.0:7860

Open a web browser and access http://localhost:7860.

The Irodori-TTS screen (WebUI) will open like this.

6. Load the AI Model

Click "Load Model" to load the AI model used for reading text.

When using it for the first time, clicking this button will start the download of the AI model.

When a completion message is displayed in the Model Status (the area circled in red in the next image), the AI model loading is complete.

7. Reading Text with Irodori-TTS

With Irodori-TTS, you can give instructions on how to read, including emotional expressions, but first, let's try reading without any instructions as an example.

Scroll down to find the text input field and enter the sentence you want to be read.

This time, we will try reading "こんにちは、これはイロドリTTSで作成された音声です。" (Hello, this is a voice created with Irodori-TTS.)

(Writing "Irodori-TTS" in Latin letters did not result in a correct reading, so I used Katakana "イロドリTTS".)

Click the "Generate" button to start the voice generation.

Irodori-TTS generates audio using your local PC's CPU or GPU (graphics card).

Therefore, the time required for generation varies greatly depending on the PC's performance.

In this instance, since it was generated on a laptop without a GPU, it took about one minute for even a short sentence.

Reference: Test generation was performed on a environment with CPU: Ryzen 5 4650U, Memory: DDR4 32GB, Windows 11 Pro 24H2.

When generation is finished, the audio waveform is displayed like this, and you can play the audio.

Example of reading "こんにちは、これはイロドリTTSで作成された音声です。"

If the preview is fine, click the download button (down arrow icon) to save the audio file.

The audio file will be saved in WAV format.

With this, you have successfully synthesized speech using Irodori-TTS.

How to Adjust Audio in Irodori-TTS

In Irodori-TTS, you can adjust expressions such as gender and emotion through various methods.

Specify Emotional Expression with Emojis

Clicking "Emoji Palette" under the text input field allows you to select emojis.

Each emoji is assigned an emotional expression.

😊 Joyfully, happily
😭 Sobbing, crying
😰 Hurriedly, upset
⏩ Fast-paced
📖 Narration, monologue

By simply putting an emoji in the text input field, you can have it read with the specified emotional expression.

Example of reading "😊 こんにちは、これはイロドリTTSで作成された音声です。"

Example of reading "📖 こんにちは、これはイロドリTTSで作成された音声です。"

However, simply specifying an emoji does not allow you to concretely specify gender or age.

Loading Reference Audio to Speak with the Same Voice

In Irodori-TTS, you can load a reference audio file and have it read by referring to that voice.

You load the reference audio from the section that says "Drop Audio Here - or - Click to Upload".

In addition to being able to read with the same voice, you can generate audio with a clearer sound quality compared to when nothing is specified.

Adjusting Reading Style Directly with the Caption Function

In Irodori-TTS, you can directly specify in text what kind of voice you want it to be read in.

To use the caption function, you need to launch the "VoiceDesign version," and the command to launch Irodori-TTS in the terminal changes.

uv run python gradio_app_voicedesign.py --server-name 0.0.0.0 --server-port 7861

Executing this command launches the operation screen for the VoiceDesign version.

Since the VoiceDesign version uses a different AI model than the standard version, you need to click "Load Model" and download the model separately from the standard version when using it for the first time.

Since the AI model is about 2GB in size, it is recommended to download it in a location with a good internet connection.

The VoiceDesign version operation screen has a text box for "Caption / Style Prompt (optional)".

Here, you enter a sentence describing what kind of voice you want it to be read in.

Please read in a calm female voice, with a close sense of distance and a soft, natural tone.
Please speak cheerfully and clearly in an energetic male voice.
Please read dispassionately like a news caster in a low male voice.

In this way, you can specify what kind of voice should be used.

For example, when reading with "Please read in a calm female voice, with a close sense of distance and a soft, natural tone," it resulted in this audio.

Example with "Please read in a calm female voice, with a close sense of distance and a soft, natural tone" specified

This also resulted in easy-to-hear audio with clear sound quality.

However, there are precautions regarding the caption function.

The caption function takes longer to generate audio compared to other reading methods.

When generated on a laptop this time, it took about 5 minutes to generate this short sentence.

When using the caption function, a high-spec PC equipped with a GPU is recommended.

What happens if you read English text?

Irodori-TTS is a text-to-speech software that only supports Japanese.

So, what happens if you try to read English text?

Let's try entering a simple example sentence.

Example of reading "Hello, this is a voice recording created using Irodori-TTS."

In this way, "Hello" became a Katakana pronunciation "ハロー" and the "recording" part became an unintelligible pronunciation, so it could not be read correctly.

If you want to read English text, it is recommended to use an AI text-to-speech service that supports foreign languages.

Recommended Speech Synthesis Method When "Setup is Difficult"

After reading this far, some of you may feel that setting up Irodori-TTS seems a bit difficult.

If you are not used to terminal operations or building a Python environment, just following the steps can take a lot of time.

Also, if you do not have a PC with a GPU, each speech synthesis takes too much time, making it difficult to use for purposes such as video narration.

In such cases, we recommend using an AI voice that requires no installation or setup.

"Ondoku": AI Voice Usable Without Installation

When you want to easily synthesize speech with the latest AI, we recommend the AI speech synthesis service "Ondoku".

"Ondoku" is an AI speech synthesis service where you can create audio simply by opening a browser and pasting text.

You can create audio for free right now on your PC, smartphone, or tablet.

Since the voice generation is performed in the cloud (server-side), there is no problem even if your PC is not equipped with a GPU.

Since multiple voices such as male, female, and children's voices are prepared from the start, you can read aloud immediately just by choosing one, without having to prepare reference audio or captions.

Long texts can also be read as they are.

What's more, Ondoku also supports English!

It supports multiple languages such as French, Spanish, Korean, and Chinese, so it can be used for reading languages other than Japanese.

Furthermore, with the next-generation AI voice (OndokuBeta), you can experience even more natural reading.

If you are looking for a way to read text as audio, why not try Ondoku, which is free and easy to use?

Try Ondoku now

Comparing the Differences Between Ondoku and Irodori-TTS

Finally, we compare the main differences between Ondoku and Irodori-TTS.

👆 You can scroll horizontally

Item	Ondoku	Irodori-TTS
Operation Method	Cloud (Operate via browser)	Local (Processed on your own PC)
Setup	Not required	Environment construction for Python, Git, etc. required
Supported Languages	Over 35 languages	Japanese only
How to Choose Voice	Just select from multiple voices	Specify via voice cloning, captions, or emojis
Per-generation Limit	Supports long texts	Up to approx. 30 seconds
Commercial Use	Possible (Credit notation required for free use)	Possible (MIT License)
Supported Devices	PC, Smartphone, Tablet	PC (GPU recommended)
Price	Free plans available (Character count expanded in paid plans)	Free (Due to local operation)

In comparison, you can use them depending on your needs: Ondoku for ease of use and immediate availability, and Irodori-TTS if you have a high-performance PC and want to finely customize the audio.

For those who want audio immediately, those who need reading in multiple languages, or those who want to use it on a smartphone or tablet, Ondoku is recommended.

It is also suitable for those who want to read long sentences as they are, those who do not want to spend time on setup, and those whose PCs are not equipped with a GPU.

Since you can generate high-quality audio immediately just by opening a browser, why not try Ondoku for free first?

Try Ondoku now

Summary of Features, Setup, and Usage of Irodori-TTS

In this article, we explained Irodori-TTS, an AI speech synthesis software specialized for Japanese that runs locally.

Irodori-TTS is an attractive tool for those who want to be particular about voice expression, such as voice quality design via voice cloning or captions, and emotional control via emojis.

However, the setup method and usage are for advanced users, and setup requires building an environment with Python and Git.

Also, on PCs without a GPU, voice generation takes time.

For those who want to "use speech synthesis easily right now," "Ondoku", which can be used just with a browser, is recommended.

With free AI speech synthesis that is easy to use, why not try creating high-quality audio yourself?

Translation:

■ AI voice synthesis software "Ondoku"

"Ondoku" is an online text-to-speech tool that can be used with no initial costs.

Supports approximately 50 languages, including Japanese, English, Chinese, Korean, Spanish, French, and German
Available from both PC and smartphone
Suitable for business, education, entertainment, etc.
No installation required, can be used immediately from your browser
Supports reading from images

To use it, simply enter text or upload a file on the site. A natural-sounding audio file will be generated within seconds. You can use voice synthesis up to 5,000 characters for free, so please give it a try.

Text-to-speech software "Ondoku" can read out 5000 characters every month with AI voice for free. You can easily download MP3s and commercial use is also possible. If you sign up for free, you can convert up to 5,000 characters per month for free from text to speech. Try Ondoku now.

HP: ondoku3.com
Email: ondoku3.com@gmail.com

←Previous post |

Text-to-speech, Ondoku Frequently Asked Questions (FAQ)

How to adjust pauses and blank time in Ondoku narration [2 methods]

How to credit Ondoku: Examples and points to note.

[Free] 5 Free Text-to-Speech Software Programs for Mac

Ondoku payment methods (credit cards, debit cards, bank transfers) and receipts

Ondoku

"Ondoku" is a Text-to-Speech service that anyone can use for free without installation. If you register for free, you can get up to 5000 characters for free each month. Register now for free

New Posts