MBROLA

MBROLA is speech synthesis software as a worldwide collaborative project. The MBROLA project web page provides diphone databases for many[1] spoken languages.

MBROLA
Original author(s)Thierry Dutoit
Developer(s)Vincent Pagel
Initial release1995 (1995)
Stable release
3.3 / 17 December 2019 (2019-12-17)
Repositorygithub.com/numediart/MBROLA
Written inC
Operating systemLinux
Windows
FreeBSD
TypeSpeech synthesizer
LicenseGNU Affero General Public License
Websitegithub.com/numediart/MBROLA

The MBROLA software is not a complete speech synthesis system for all those languages; the text must first be transformed into phoneme and prosodic information in MBROLA's format, and separate software (e.g. eSpeakNG) is necessary.

History

MBROLA project started in 1995 at TCTS Lab of the Faculté polytechnique de Mons (Belgium) as a scientific project to obtain a set of speech synthesizers for as many languages as possible. First release of MBROLA software was in 1996 and was provided as freeware for non-commercial, non-military application. Licenses for created voice databases differ, but are also mostly for non-commercial and non-military use.

Due to its free usage only for non-commercial applications, MBROLA was as alternative choice for private/home users for de facto speech synthesis engine eSpeakNG in Linux workstations, but mostly was not used for commercial solutions (e.g. for speaking time clocks, boarding notifications for ports and terminals etc.) After initial development of voice databases updates and support of MBROLA software ceased and gradually closed-source binaries fell behind development of recent hardware and operating systems.[2] To deal with this MBROLA development team decided to release MBROLA as open source software, and on October 24, 2018 source code was released on GitHub with GNU Affero General Public License. On January 23, 2019 tool called MBROLATOR was released to provide creation of MBROLA database from WAV files with the same license.

Used technology

MBROLA software uses MBROLA (Multi-Band Resynthesis OverLap Add)[3] algorithm for speech generation. Although it is diphone-based, the quality of MBROLA's synthesis is considered to be higher than that of most diphone synthesisers as it preprocesses the diphones imposing constant pitch and harmonic phases that enhances their concatenation while only slightly degrading their segmental quality.

MBROLA is a time-domain algorithm similar to PSOLA, which implies very low computational load at synthesis time. Unlike PSOLA, however, MBROLA does not require a preliminary marking of pitch periods. This feature has made it possible to develop the MBROLA project around the MBROLA algorithm, through which many speech research labs, companies, or individuals around the world have provided diphone databases for many languages and voices, but there are some notable omissions such as Chinese.

gollark: Which is *simple*, but not *easy*.
gollark: You can write an interpreter for that in a few hundred lines of high-level language.
gollark: If you want "simple", how about, I don't know, lisp?
gollark: "Costless" how?
gollark: I'd partly agree, but that doesn't mean ALL ABSTRACTION is hard to use.

References

  1. List of MBROLA voices
  2. Mbrola-64 crashes immediately with a SEGFAULT
  3. Dutoit, T; Leich, H (Dec 1993). "MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database". Speech Communication. 13 (3–4): 435–440. doi:10.1016/0167-6393(93)90042-J.

See also

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.