What is Irodori-TTS? Features, Precautions, and Setup Guide Explained
May 31, 2026

What kind of AI text-to-speech software is Irodori-TTS?
Many of you may be wondering about the new AI text-to-speech software "Irodori-TTS."
In this article, we will explain the features, capabilities, precautions, and usage of Irodori-TTS in an easy-to-understand manner.
Furthermore, for those who feel that "the setup seems difficult," we also introduce a speech synthesis method that can be used immediately without installation.
What you will learn in this article
- What kind of software is Irodori-TTS?
- Capabilities and precautions of Irodori-TTS
- How to use Irodori-TTS (from setup to voice adjustment)
- Recommended methods when environment construction is difficult
What is Irodori-TTS? Explaining the Japanese AI Speech Synthesis Software

First, we will briefly explain what kind of AI speech synthesis software Irodori-TTS is and its features.
Irodori-TTS is an AI Speech Synthesis Model that Runs Locally
Irodori-TTS is an AI speech synthesis software specialized for Japanese.
The developer is Aratako, and it is released for free as open source (MIT license).
The biggest feature is that it can perform "local operation," where speech synthesis is completed entirely on your own PC.
Since all voice generation processing takes place on your local PC, text and generated voice data are never sent to external servers.
After the initial setup, you can generate speech without an internet connection, and there are no limits on the number of generations.
However, the setup requires programming tools such as Python and Git.
Additionally, a high-performance PC equipped with a GPU (graphics card) is recommended for high-speed operation.
What You Can and Cannot Do with Irodori-TTS

Next, we will explain what Irodori-TTS can and cannot do.
What Irodori-TTS Can Do
Since Irodori-TTS runs in a local environment, you can generate speech an unlimited number of times.
Even in an environment without an internet connection, you can freely create speech once the initial setup is complete.
There are multiple ways to instruct what kind of voice to create; using the caption function, you can create your preferred voice quality just through text instructions.
It is also possible to reproduce an existing voice through voice cloning or add emotional expressions using emojis.
Since it uses the MIT license, commercial use of the generated audio is also possible.
Precautions for Irodori-TTS
On the other hand, there are some precautions you should know before using Irodori-TTS.
Each generated audio is limited to about 30 seconds
The amount of text that can be read in a single generation is limited to about 30 seconds.
If you want to read a long text, you need to split the text and generate it multiple times.
It is difficult to achieve exactly the voice or speaking style you want
While Irodori-TTS offers high freedom, it does not provide default voices (base voices).
Therefore, unless you specify captions or reference audio, the gender and age will change randomly with each generation.
To read with the same voice, you need to load a reference audio file.
Additionally, there is no function to manually adjust intonation or inflection.
Supported language is Japanese only
The supported language is only Japanese; it does not support foreign languages such as English.
Also, note that misreadings of Kanji can occur.
High-spec PC with GPU recommended
Depending on your PC specs, voice generation may take time.
On a PC without a GPU, it takes about one minute to generate even a short sentence.
On entry-class CPUs like Celeron or N100, practical use would be difficult.
How to Use Irodori-TTS (Setup Workflow)
Here, we will briefly explain how to use Irodori-TTS.
The overall setup flow is as follows:
- Install necessary software
- Create a working folder
- Clone Irodori-TTS from GitHub
- Install necessary packages
- Launch Irodori-TTS
- Load the AI model
- Read the text aloud
1. Install Software Necessary for Irodori-TTS
Preparation is required to set up Irodori-TTS.
First, install these three types of software:
- Python 3.10 or higher: Programming language
- Git: Version control system (Required to download Irodori-TTS)
- uv: Package manager for Python
To install Python, Git, and uv, first right-click the Start menu and click "Terminal" (you do not need to launch as an administrator).

The terminal (PowerShell) screen will then open.

In this screen, enter and execute the following commands:
winget install --id Git.Git -e
winget install --id=astral-sh.uv -e

You have now installed what is necessary to set up Irodori-TTS.
*Python is managed by uv, so it will be installed automatically during setup.
Once installed, close and reopen the terminal (PowerShell) once (to "set the path").
2. Create a Working Folder
Next, create a working folder.
Irodori-TTS will be installed here.
In this example, we created a folder named "irodori-tts" directly under the C drive.

After creating the folder, move to that folder in the terminal.
cd C:\irodori-tts

3. Clone Irodori-TTS from GitHub
Enter the following command in the terminal to clone the Irodori-TTS repository from GitHub.
git clone https://github.com/Aratako/Irodori-TTS.git

Cloning the repository will finish in a few seconds.
Enter the following command to move to the folder of the cloned repository.
cd Irodori-TTS

4. Install Necessary Packages
Enter and execute the following command to install the packages required to run Irodori-TTS.
uv sync

It will take some time as it downloads and installs a large number of packages.

The Python core will also be installed here.
While downloading and installing, wait without closing the terminal screen.
Since it downloads files close to 3GB in size, it is recommended to set this up in a location with a good internet connection.
5. Launch Irodori-TTS
Once the package download and installation are finished, setup is complete.
Launch Irodori-TTS.
Enter and execute the following command and wait a moment for it to start.
uv run python gradio_app.py --server-name 0.0.0.0 --server-port 7860
When the following is displayed in the terminal, the launch is complete.

Running on local URL: http://0.0.0.0:7860
Open a web browser and access http://localhost:7860.
The Irodori-TTS screen (WebUI) will open like this.

6. Load the AI Model
Click "Load Model" to load the AI model used for reading text.

When using it for the first time, clicking this button will start the download of the AI model.
When a completion message is displayed in the Model Status (the area circled in red in the next image), the AI model loading is complete.

7. Reading Text with Irodori-TTS
With Irodori-TTS, you can give instructions on how to read, including emotional expressions, but first, let's try reading without any instructions as an example.
Scroll down to find the text input field and enter the sentence you want to be read.

This time, we will try reading "こんにちは、これはイロドリTTSで作成された音声です。" (Hello, this is a voice created with Irodori-TTS.)
(Writing "Irodori-TTS" in Latin letters did not result in a correct reading, so I used Katakana "イロドリTTS".)
Click the "Generate" button to start the voice generation.

Irodori-TTS generates audio using your local PC's CPU or GPU (graphics card).
Therefore, the time required for generation varies greatly depending on the PC's performance.
In this instance, since it was generated on a laptop without a GPU, it took about one minute for even a short sentence.
Reference: Test generation was performed on a environment with CPU: Ryzen 5 4650U, Memory: DDR4 32GB, Windows 11 Pro 24H2.
When generation is finished, the audio waveform is displayed like this, and you can play the audio.

Example of reading "こんにちは、これはイロドリTTSで作成された音声です。"
If the preview is fine, click the download button (down arrow icon) to save the audio file.
The audio file will be saved in WAV format.
With this, you have successfully synthesized speech using Irodori-TTS.
How to Adjust Audio in Irodori-TTS
In Irodori-TTS, you can adjust expressions such as gender and emotion through various methods.
Specify Emotional Expression with Emojis
Clicking "Emoji Palette" under the text input field allows you to select emojis.

Each emoji is assigned an emotional expression.
- 😊 Joyfully, happily
- 😭 Sobbing, crying
- 😰 Hurriedly, upset
- ⏩ Fast-paced
- 📖 Narration, monologue
By simply putting an emoji in the text input field, you can have it read with the specified emotional expression.
Example of reading "😊 こんにちは、これはイロドリTTSで作成された音声です。"
Example of reading "📖 こんにちは、これはイロドリTTSで作成された音声です。"
However, simply specifying an emoji does not allow you to concretely specify gender or age.
Loading Reference Audio to Speak with the Same Voice
In Irodori-TTS, you can load a reference audio file and have it read by referring to that voice.
You load the reference audio from the section that says "Drop Audio Here - or - Click to Upload".

In addition to being able to read with the same voice, you can generate audio with a clearer sound quality compared to when nothing is specified.
Adjusting Reading Style Directly with the Caption Function
In Irodori-TTS, you can directly specify in text what kind of voice you want it to be read in.
To use the caption function, you need to launch the "VoiceDesign version," and the command to launch Irodori-TTS in the terminal changes.
uv run python gradio_app_voicedesign.py --server-name 0.0.0.0 --server-port 7861

Executing this command launches the operation screen for the VoiceDesign version.
Since the VoiceDesign version uses a different AI model than the standard version, you need to click "Load Model" and download the model separately from the standard version when using it for the first time.
Since the AI model is about 2GB in size, it is recommended to download it in a location with a good internet connection.
The VoiceDesign version operation screen has a text box for "Caption / Style Prompt (optional)".

Here, you enter a sentence describing what kind of voice you want it to be read in.
- Please read in a calm female voice, with a close sense of distance and a soft, natural tone.
- Please speak cheerfully and clearly in an energetic male voice.
- Please read dispassionately like a news caster in a low male voice.
In this way, you can specify what kind of voice should be used.
For example, when reading with "Please read in a calm female voice, with a close sense of distance and a soft, natural tone," it resulted in this audio.
Example with "Please read in a calm female voice, with a close sense of distance and a soft, natural tone" specified
This also resulted in easy-to-hear audio with clear sound quality.
However, there are precautions regarding the caption function.
The caption function takes longer to generate audio compared to other reading methods.
When generated on a laptop this time, it took about 5 minutes to generate this short sentence.
When using the caption function, a high-spec PC equipped with a GPU is recommended.
What happens if you read English text?
Irodori-TTS is a text-to-speech software that only supports Japanese.
So, what happens if you try to read English text?
Let's try entering a simple example sentence.
Example of reading "Hello, this is a voice recording created using Irodori-TTS."
In this way, "Hello" became a Katakana pronunciation "ハロー" and the "recording" part became an unintelligible pronunciation, so it could not be read correctly.
If you want to read English text, it is recommended to use an AI text-to-speech service that supports foreign languages.
Recommended Speech Synthesis Method When "Setup is Difficult"
After reading this far, some of you may feel that setting up Irodori-TTS seems a bit difficult.
If you are not used to terminal operations or building a Python environment, just following the steps can take a lot of time.
Also, if you do not have a PC with a GPU, each speech synthesis takes too much time, making it difficult to use for purposes such as video narration.
In such cases, we recommend using an AI voice that requires no installation or setup.
"Ondoku": AI Voice Usable Without Installation

When you want to easily synthesize speech with the latest AI, we recommend the AI speech synthesis service "Ondoku".
"Ondoku" is an AI speech synthesis service where you can create audio simply by opening a browser and pasting text.
You can create audio for free right now on your PC, smartphone, or tablet.
Since the voice generation is performed in the cloud (server-side), there is no problem even if your PC is not equipped with a GPU.
Since multiple voices such as male, female, and children's voices are prepared from the start, you can read aloud immediately just by choosing one, without having to prepare reference audio or captions.
Long texts can also be read as they are.
What's more, Ondoku also supports English!
It supports multiple languages such as French, Spanish, Korean, and Chinese, so it can be used for reading languages other than Japanese.
Furthermore, with the next-generation AI voice (OndokuBeta), you can experience even more natural reading.
If you are looking for a way to read text as audio, why not try Ondoku, which is free and easy to use?
Comparing the Differences Between Ondoku and Irodori-TTS
Finally, we compare the main differences between Ondoku and Irodori-TTS.
| Item | Ondoku | Irodori-TTS |
|---|---|---|
| Operation Method | Cloud (Operate via browser) | Local (Processed on your own PC) |
| Setup | Not required | Environment construction for Python, Git, etc. required |
| Supported Languages | Over 35 languages | Japanese only |
| How to Choose Voice | Just select from multiple voices | Specify via voice cloning, captions, or emojis |
| Per-generation Limit | Supports long texts | Up to approx. 30 seconds |
| Commercial Use | Possible (Credit notation required for free use) | Possible (MIT License) |
| Supported Devices | PC, Smartphone, Tablet | PC (GPU recommended) |
| Price | Free plans available (Character count expanded in paid plans) | Free (Due to local operation) |
In comparison, you can use them depending on your needs: Ondoku for ease of use and immediate availability, and Irodori-TTS if you have a high-performance PC and want to finely customize the audio.
For those who want audio immediately, those who need reading in multiple languages, or those who want to use it on a smartphone or tablet, Ondoku is recommended.
It is also suitable for those who want to read long sentences as they are, those who do not want to spend time on setup, and those whose PCs are not equipped with a GPU.
Since you can generate high-quality audio immediately just by opening a browser, why not try Ondoku for free first?
Summary of Features, Setup, and Usage of Irodori-TTS
In this article, we explained Irodori-TTS, an AI speech synthesis software specialized for Japanese that runs locally.
Irodori-TTS is an attractive tool for those who want to be particular about voice expression, such as voice quality design via voice cloning or captions, and emotional control via emojis.
However, the setup method and usage are for advanced users, and setup requires building an environment with Python and Git.
Also, on PCs without a GPU, voice generation takes time.
For those who want to "use speech synthesis easily right now," "Ondoku", which can be used just with a browser, is recommended.
With free AI speech synthesis that is easy to use, why not try creating high-quality audio yourself?
Translation:■ AI voice synthesis software "Ondoku"
"Ondoku" is an online text-to-speech tool that can be used with no initial costs.
- Supports approximately 50 languages, including Japanese, English, Chinese, Korean, Spanish, French, and German
- Available from both PC and smartphone
- Suitable for business, education, entertainment, etc.
- No installation required, can be used immediately from your browser
- Supports reading from images
To use it, simply enter text or upload a file on the site. A natural-sounding audio file will be generated within seconds. You can use voice synthesis up to 5,000 characters for free, so please give it a try.
Email: ondoku3.com@gmail.com
"Ondoku" is a Text-to-Speech service that anyone can use for free without installation. If you register for free, you can get up to 5000 characters for free each month. Register now for free
- What is Ondoku
- Start text-to-speech conversion
- Free registration
- Pricing
- Posts
- Try other free services