Reducing Researcher Burden and Realizing Efficient Symposiums with Ondoku! Case Study: National Museum of Ethnology
Jan. 26, 2026
- National Museum of Ethnology
- Industry: Museum
- Location: Suita City, Osaka Prefecture
- Interview: Mr. Kobayashi, Researcher, X-DiPLAS Project
Objectives and Challenges
Used for narration in video works related to cultural anthropology and ethnology. While we previously had researchers active around the world record their own voices, it was difficult to record well online, resulting in significant time and effort.
Solution
Abolished voice recording by researchers and created narration using Ondoku.
Effects
It is no longer necessary to tie up researchers' time for repeated recordings, significantly reducing the workload. The speed of video production has also increased, allowing for response to tight schedules.
In this article, as a case study for the introduction of "Ondoku," we introduce how it is utilized at the National Museum of Ethnology (Minpaku) in Suita City, Osaka Prefecture.
Introduction of the Organization and Department
Mr. Kobayashi (hereafter, Kobayashi): The National Museum of Ethnology is a research institute for cultural anthropology and ethnology that also functions as a museum. Researchers with fields of study all over the world belong here, and they provide their research results widely to society. Additionally, it is affiliated with "The Graduate University for Advanced Studies" for cultural research, where students aiming for doctoral degrees deepen their learning daily. Known fondly as "Minpaku," the museum celebrated its 50th anniversary in 2024.
I have been active as a researcher for X-DiPLAS (X-Digital Platform for Anthropological Studies) since fiscal year 2022. X-DiPLAS is a project aimed at building an environment where the public can freely view databases of photos taken by cultural anthropologists and archaeologists around the world. Currently, we are focusing on activities that reflect "digital stories"—narratives that show what kind of story a single photograph holds—rather than just storing photos.
HP: National Museum of Ethnology
Please tell us the background behind introducing a text-to-speech tool.

Kobayashi: We introduced the text-to-speech tool primarily to reduce the burden on researchers.
I continue activities to database photos taken by cultural anthropologists and archaeologists worldwide and make them available to the public. However, simply collecting photos is not enough to say that history has been correctly passed on. Since there is always a story behind a photo, it is necessary to produce video works that include the "voice" of the researcher explaining the circumstances at the time the photo was taken.
Initially, we contacted researchers around the world and recorded their voices online for the video works. However, problems such as voice lag and external noise caused repeated re-recordings, which was a major issue. While looking for something that could reduce the burden on researchers, I came across the text-to-speech tool that creates a "voice" just by entering text.
Could you tell us how you came to adopt Ondoku?
Kobayashi: I used to have a bit of a resistance to AI-based systems like text-to-speech tools. I felt a sense of discomfort with voices that seemed somewhat detached from reality. However, Ondoku's voice was natural and gave me the feeling that it was ideal for replacing researcher narration.
Furthermore, the excellent cost-performance was also a factor in our evaluation. There is a good balance between ease of use and price, and it has been performing sufficiently well even within a limited budget.
Compared to before the introduction of Ondoku, has the challenge been resolved or improved?

Kobayashi: In addition to reducing the burden on researchers, we are now able to create video works smoothly.
Recording work that ties up a researcher's time was one of the negative aspects of our job. Thanks to Ondoku, the fact that our interactions with researchers have become smoother is a major step forward. Furthermore, since high-quality audio is completed just by typing characters, it contributes to increasing the speed of video production.
We regularly hold symposia to promote our projects. For the presentations, we were able to proceed smoothly with creating the necessary materials using Ondoku. I particularly feel the breadth of Ondoku's utility when preparation periods for symposia are tight. In fact, during periods when production is busy, I use Ondoku almost every day.
Besides its use in research, please tell us if there are any other instances where Ondoku has been helpful.

Kobayashi: By introducing Ondoku, the quality of the text written for narration has improved.
When writing, one sometimes accidentally adds redundant expressions, making sentences too long. Actually, if you input unnecessarily long sentences into Ondoku, it doesn't read them very well. Because it utters words much like a human, there are moments where the timing of breaths feels unnatural.
By relying on the voice and reviewing the expressions before trying again, I realize that high-quality text is being produced. The appeal of Ondoku is not just that it reads audio, but that it also helps in creating clear and concise writing.
If you have any further requests for improvement regarding Ondoku, please let us know.

Kobayashi: From the perspective of a researcher who works overseas, I would be happy if Swahili were added. Swahili is a language with a wide range of use within Africa, so having it in the lineup would broaden the scope of our work.
Of course, for even smoother utilization, it might be good to have system improvements like intonation control. However, if it becomes too functional, the simple operability might be lost. When considering the balance including price, I feel that the current interface is just the right line.
How do you intend to use Ondoku in the future?

Kobayashi: I hope to increase the opportunities to utilize Ondoku within educational activities as well.
I sometimes give students at the university affiliated with the museum assignments to create short video works. During the assignment presentation, I recommended the introduction of narration via Ondoku as a means to improve the quality of their work. I feel there are many situations where Ondoku, which even students can use easily, would be useful in classes.
Furthermore, I want to leverage Ondoku's strength in foreign language support for future activities. Currently, we have plans to create new video works using English-translated narrations created by overseas researchers.
If we were to request English voice-overs from external sources, no matter how much budget we had, it wouldn't be enough. With Ondoku, it should contribute to overwhelming cost reduction. I want to continue the activity of capturing researchers' voices using Ondoku in the future.
Through the use of Ondoku, you have not only reduced the burden of voice recording for researchers but also reduced the time cost involved in producing video works! Thank you very much for sharing this wonderful case study.


■ AI voice synthesis software "Ondoku"
"Ondoku" is an online text-to-speech tool that can be used with no initial costs.
- Supports approximately 50 languages, including Japanese, English, Chinese, Korean, Spanish, French, and German
- Available from both PC and smartphone
- Suitable for business, education, entertainment, etc.
- No installation required, can be used immediately from your browser
- Supports reading from images
To use it, simply enter text or upload a file on the site. A natural-sounding audio file will be generated within seconds. You can use voice synthesis up to 5,000 characters for free, so please give it a try.
Email: ondoku3.com@gmail.com
"Ondoku" is a Text-to-Speech service that anyone can use for free without installation. If you register for free, you can get up to 5000 characters for free each month. Register now for free
