A Festival of speech synthesis for Linux

2165

Author: Rohit Girhotra

As information technology becomes more pervasive, the issues of communication between information-processing machines and people becomes increasingly important. Up to now such communication has been almost entirely by means of video screens. Speech, which is by far the most widely used and natural means of communication between people, is an obvious possible substitute. However, this deceptively simple means of exchanging information is, in fact, extremely complicated. Festival Speech Synthesis System aims to make things a little easier on interface developers.

Speech synthesis — automatic generation of human speech waveforms without directly using a human voice — has been under development for decades. Speech synthesizers, often called text-to-speech (TTS) synthesizer systems, can be implemented in either software or hardware. The first commercial speech synthesis systems were mostly hardware-based, and their development process was time-consuming and expensive. Since computers have become more powerful, most synthesizers today are software-based. Software-based systems are easy to configure and update, and much less expensive than their hardware counterparts.

You can find a wide array of software tools for speech synthesis, ranging from commercial products to software for download over the Internet, with varying kinds of licensing. Some commercially available TTS systems include:

Recently, the speech research community has been turning toward open source software, as exemplified by toolkits such as CSLU toolkit, the ISIP Automatic Speech Recognition toolkit, and the Edinburgh speech tools, all of which can help your computer find its voice.

There are many advantages to using open source software for research work. Frequently a researcher is faced with a tool that almost does the task at hand, but needs some tweaking. Having access to the source code allows the researcher, at least in theory, to make the needed modifications. But mere openness is not a guarantee of flexibility. In order for a tool to be flexible, it must have well-defined programming interfaces — otherwise, extensions and modifications will be hard to develop and maintain — and it must be interoperable with other tools.

Festival Speech Synthesis System is one such tool. Festival grew out of the need for a unifying, flexible, and extensible tool for research and educational purposes at The Centre for Speech Technology Research (CSTR) at University of Edinburgh.

Festival is a free, portable, extensible, language-independent, run-time speech synthesis engine for various platforms that has been under development since 1999. Primary authors of the C++ system include Alan W Black, Paul Taylor, and Richard Caley. Festival is a part of the Festvox project that aims to make the building of new synthetic voices more systematic and better documented, making it possible for anyone to build a new voice.

Festival offers developers a basic framework for building speech synthesis systems, and includes various demo modules. It offers text-to-speech through a number of APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and even via an Emacs interface. Though Festival is multi-lingual (currently English, Welsh, and Spanish), support for English is the most advanced. The system uses Edinburgh Speech Tools for its underlying architecture and has a Scheme-based (SIOD) command interpreter for control.

The Festival Speech Synthesis System was designed to target three classes of speech synthesis users:

  • Speech synthesis researchers, who may use Festival for developing and testing new speech synthesis methods;
  • Speech application developers, who are developing language systems and wish to include synthesis output, such as different voices, specific phrasing, and dialog types; and
  • End-users, with systems that take text and generates speech, requiring little configuration from users.

Taking stock of the Festival Speech Synthesis System

Want to try Festival version 1.95 on your Linux system? First, ensure that you have the latest working version of the C++ (gcc) compiler installed on your system. Most of the problems people have had in installing Festival have been due to an incomplete or bad compiler installation. Also make sure that your sound card is configured and working correctly.

To install Festival you will need to download the following source packages from the Festival download page:

  • festival-1.95-beta.tar.gz — The core festival package
  • speech_tools-1.2.95-beta.tar.gz — The Edinburgh Speech tools library
  • festlex_OALD.tar.gz — The lexicon distribution
  • festlex_POSLEX.tar.gz — The lexicon distribution
  • festvox_rablpc16k.tar.gz — The speech database

You will find several other packages available for download at the Web site, but you won’t need them unless you wish to add support for more voices to the basic TTS system.

Having downloaded all the above packages, log in as root, change to the directory where you downloaded the packages, and issue the command tar --xvzf package_name to unpack them. After the unpacking, your current directory will contain the subdirectories speech_tools/ and festival/.

Next, you need to compile the source files. Change to the speech_tools directory and issue the commands:

./configure
make

Then change to the festival directory and issue the same commands.

That’s it! The Festival Speech Synthesis System is now installed on your Linux box.

Using Festival

There are various ways you can use Festival. To get into the Interactive Festival Console, type festival at the shell prompt. You should find yourself at a prompt like the one below:

festival>

Your speech synthesis system is now ready to accept input. To get your system to talk to you, try out the following command:

festival> (SayText "type the text you want to hear over here")

The parentheses are required here, and the text to be spoken must be enclosed in double quotes.

If you have a text file with something in it that you want to hear, use the command:

festival> (tts filename)

Replace filename with the relative path to your file, and make sure that the file is a plain ASCII text file. You can use the Tab key here for automatic file name completion.

If you have a plain ASCII text file that you wish to hear, you can call Festival from the command prompt:

festival --tts filename

For more information on using Festival, check out the man pages or type help at the festival prompt to see a list of useful commands. More documentation is also available in texinfo and HTML format on the project’s site.

Flite and other open-source TTS alternatives

An alternative TTS engine is Flite (Festival-lite), a small, fast run-time binary speech synthesis engine. Flite was designed for embedded systems like PDAs as well as large server installations, which must serve synthesis to many ports. It was written in ANSI C, and is designed to be portable to almost any platform.

Other freely available open-source TTS systems are:

  • MBROLA, a freely available diphone concatenation system
  • Gnuspeech, an extensible TTS package based on real-time, articulatory, speech-synthesis-by-rules
  • FreeTTS, written entirely in the Java, based upon Flite
  • Epos, a rule-driven TTS system primarily designed to serve as a research tool