Author: Rohit Girhotra
Speech synthesis — automatic generation of human speech waveforms without directly using a human voice — has been under development for decades. Speech synthesizers, often called text-to-speech (TTS) synthesizer systems, can be implemented in either software or hardware. The first commercial speech synthesis systems were mostly hardware-based, and their development process was time-consuming and expensive. Since computers have become more powerful, most synthesizers today are software-based. Software-based systems are easy to configure and update, and much less expensive than their hardware counterparts.
You can find a wide array of software tools for speech synthesis, ranging from commercial products to software for download over the Internet, with varying kinds of licensing. Some commercially available TTS systems include:
- Apple PlainTalk
- Acapela Speech Technologies
- Rhetorical rVoice
- Loquendo TTS
- ScanSoft RealSpeak
- Sakrament Text-to-Speech Engine
- Nuance Vocalizer
- AT&T Natural Voices
Recently, the speech research community has been turning toward open source software, as exemplified by toolkits such as CSLU toolkit, the ISIP Automatic Speech Recognition toolkit, and the Edinburgh speech tools, all of which can help your computer find its voice.
There are many advantages to using open source software for research work. Frequently a researcher is faced with a tool that almost does the task at hand, but needs some tweaking. Having access to the source code allows the researcher, at least in theory, to make the needed modifications. But mere openness is not a guarantee of flexibility. In order for a tool to be flexible, it must have well-defined programming interfaces — otherwise, extensions and modifications will be hard to develop and maintain — and it must be interoperable with other tools.
Festival Speech Synthesis System is one such tool. Festival grew out of the need for a unifying, flexible, and extensible tool for research and educational purposes at The Centre for Speech Technology Research (CSTR) at University of Edinburgh.
Festival is a free, portable, extensible, language-independent, run-time speech synthesis engine for various platforms that has been under development since 1999. Primary authors of the C++ system include Alan W Black, Paul Taylor, and Richard Caley. Festival is a part of the Festvox project that aims to make the building of new synthetic voices more systematic and better documented, making it possible for anyone to build a new voice.
Festival offers developers a basic framework for building speech synthesis systems, and includes various demo modules. It offers text-to-speech through a number of APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and even via an Emacs interface. Though Festival is multi-lingual (currently English, Welsh, and Spanish), support for English is the most advanced. The system uses Edinburgh Speech Tools for its underlying architecture and has a Scheme-based (SIOD) command interpreter for control.
The Festival Speech Synthesis System was designed to target three classes of speech synthesis users:
- Speech synthesis researchers, who may use Festival for developing and testing new speech synthesis methods;
- Speech application developers, who are developing language systems and wish to include synthesis output, such as different voices, specific phrasing, and dialog types; and
- End-users, with systems that take text and generates speech, requiring little configuration from users.
Taking stock of the Festival Speech Synthesis System
Want to try Festival version 1.95 on your Linux system? First, ensure that you have the latest working version of the C++ (gcc) compiler installed on your system. Most of the problems people have had in installing Festival have been due to an incomplete or bad compiler installation. Also make sure that your sound card is configured and working correctly.
To install Festival you will need to download the following source packages from the Festival download page:
- festival-1.95-beta.tar.gz — The core festival package
- speech_tools-1.2.95-beta.tar.gz — The Edinburgh Speech tools library
- festlex_OALD.tar.gz — The lexicon distribution
- festlex_POSLEX.tar.gz — The lexicon distribution
- festvox_rablpc16k.tar.gz — The speech database
You will find several other packages available for download at the Web site, but you won’t need them unless you wish to add support for more voices to the basic TTS system.
Having downloaded all the above packages, log in as root, change to the directory where you downloaded the packages, and issue the command tar --xvzf package_name
to unpack them. After the unpacking, your current directory will contain the subdirectories speech_tools/ and festival/.
Next, you need to compile the source files. Change to the speech_tools directory and issue the commands:
./configure
make
Then change to the festival directory and issue the same commands.
That’s it! The Festival Speech Synthesis System is now installed on your Linux box.
Using Festival
There are various ways you can use Festival. To get into the Interactive Festival Console, type festival
at the shell prompt. You should find yourself at a prompt like the one below:
festival>
Your speech synthesis system is now ready to accept input. To get your system to talk to you, try out the following command:
festival> (SayText "type the text you want to hear over here")
The parentheses are required here, and the text to be spoken must be enclosed in double quotes.
If you have a text file with something in it that you want to hear, use the command:
festival> (tts filename)
Replace filename with the relative path to your file, and make sure that the file is a plain ASCII text file. You can use the Tab key here for automatic file name completion.
If you have a plain ASCII text file that you wish to hear, you can call Festival from the command prompt:
festival --tts filename
For more information on using Festival, check out the man pages or type help
at the festival prompt to see a list of useful commands. More documentation is also available in texinfo and HTML format on the project’s site.
Flite and other open-source TTS alternatives
An alternative TTS engine is Flite (Festival-lite), a small, fast run-time binary speech synthesis engine. Flite was designed for embedded systems like PDAs as well as large server installations, which must serve synthesis to many ports. It was written in ANSI C, and is designed to be portable to almost any platform.
Other freely available open-source TTS systems are: