Linux.com

Feature

Speak to me, Linux

By Josephine Ciuca on January 19, 2005 (8:00:00 AM)

Share    Print    Comments   

Voice control is the next step in human interaction with computers. Voice recognition, and its flip side, speech synthesis, can help you streamline your day-to-day work and organize your Linux desktop in a better way.

To begin conversing with your Linux desktop, download the Sphinx-2 speech recognition engine and the Festival text to speech application. Although the CMU Sphinx Group provides several versions of Sphinx (Sphinx-2, -3, and -4), I use only Sphinx-2, as it is the fastest. Even though it is not as accurate as Sphinx-3 or Sphinx-4, it runs in real time, and therefore works well with live applications.

The installation of Sphinx-2 and Festival should be trivial; most distributions already have binaries, and even compiling from source should not be difficult. Debian users might find Festival a little tricky to install if they own an onboard sound card with AC97 codecs. (The symptom is speech that sounds twice as fast as it should, no matter what speed you set up; unfortunately I couldn't find any solution except for changing the sound card.)

Happily, the normal desktop user will not have to learn Festival's command-line interface, as great applications such as KDE Text-to-Speech System (KTTS) and Perlbox Voice fill this gap. KDE 3.4 will talk to you via Festival, Festival Lite (flite) or FreeTTS (another free speech synthesis written in Java), in a multitude of languages and accents. If you want to use KTTS with your present KDE desktop, sources as well as binaries for Debian, SUSE, and Mandrake are available at KTTS's home page.

KTTS Interface
The KTTS Interface

KTTS works by sending the text to be spoken via DCOP to the KTTS daemon. KTTS can read you pop-ups from Knotify, Web sites, or any other text. You can open Konqueror, navigate to a Web site, select Tools, and choose Speak Text. If you didn't highlight anything, KTTS will read the whole page.

In KDE Text-To-Speech Manager (kttsmgr) you can manage your languages, speech engine, and what your computer reads for you. Your computer can act as your private secretary and read your email messages while you manage other applications. You can use multiple languages, which can be useful if for example you are a native German listener, but you need to read an English Web page. If you need other voices than English, take a look at the MBROLA Project. MBROLA tries to obtain as many sets of speech synthesizers for as many languages as possible, and provide them free for non-commercial applications. You can use Mbrola voices with Festival.

Adding Sphinx-2 and Perlbox Voice to Festival, you can make your computer listen to what you tell it and take actions accordingly. Perlbox Voice provides a transparent interface to several open source speech systems, but it is mainly an easy-to-use front end to Sphinx-2, built in Perl and Tk; you'll need to have both languages installed before you install Perlbox. Perlbox comes with a Perl script for installing, which will copy files in the right location. When that completes, fire up the application with the perlbox-voice command.

Perlbox Interface
Perlbox Interface

To get your Linux desktop to perform an action, write in one Perlbox box the magic words to invoke the action, and in another what the computer is expected to do when it hears them. For example, in the "When you say" box you write "Web" and in the "Computer does" box, Konqueror. Then start Perlbox's listener via the Control tab and say "Web"; Konqueror will start.

Perlbox comes with a KDE plug-in that allows you change from one desktop to another, invoke K Menu, refresh the desktop, and more, all via voice commands. You can extend the plug-in with more commands by modifying a single file. If you use another desktop manager, Perlbox comes with good documentation on how to build your own plug-ins.

In noisy environments, you can use a magic word to activate Perlbox, in order to keep the application from taking actions if it mishears aleatory words.

The ease of using these speech engines and speech recognition systems could make Linux the preferred OS for the visually impaired. Open licenses allow rapid improvement and development. And it's just a lot of fun to talk to your closest co-worker: your computer.

Share    Print    Comments   

Comments

on Speak to me, Linux

Note: Comments are owned by the poster. We are not responsible for their content.

Linux

Posted by: Anonymous Coward on January 19, 2005 04:45 PM
Any Linux can run on top of Windows XP?

#

Re:Linux

Posted by: Anonymous Coward on January 19, 2005 07:23 PM
Yup. Go visit <A HREF="http://www.colinux.org/" title="colinux.org">http://www.colinux.org</a colinux.org> if you want to run any Linux on top of Windows XP.

#

programmer's toy?

Posted by: Anonymous Coward on January 21, 2005 01:07 AM
This looks like a programmer's toy from my perspecective, unfortunately. I am very interested in speech-to-text under Linux. But I am interested in that because I need to write long research papers on humanities subjects, and being able to dictate them (as opposed to typing) would be a real relief. I think that goes for a good majority of production users out there. Until something along the lines of Dragon Naturally Speaking for Linux is developed, those of us who use our computers for productvity tasks like writing papers, letters or books will continue to view Linux as stunted and lagging in serving our needs. Articles like this one underline the divide between those who want to play with computers and those who need to use them for accomplishing office and study tasks.

#

Re:programmer's toy?

Posted by: Anonymous Coward on July 25, 2005 04:21 AM
In fact there is ViaVoice for Linux. It's a version from about 2000-2002 which was part of the Mandrake Linux PowerPack deluxe 7.2 to 8.0. Because of incompatibilities to newer glibcs you need an older distribution to run it. I have it working nicely on SuSE 8.1. After 2002 IBM has canceled its Linux engagement.
Apart from that it is said that Dragon Naturally Speaking (at least version 4) is running with wine (<a href="http://www.winehq.com/site?issue=264#Dragon%20Naturally%20Speaking%20Working" title="winehq.com">http://www.winehq.com/site?issue=264#Dragon%20Na<nobr>t<wbr></nobr> urally%20Speaking%20Working</a winehq.com>).

#

Debian and sound cars with AC97 codec

Posted by: Anonymous Coward on January 23, 2005 09:00 PM
I run Debian sarge with such a sound card and everything works well. Definitely no need to change a sound card.
vlcak

#

Re:Tts... and voice command

Posted by: Anonymous Coward on January 24, 2005 04:29 PM
No, thats not really true. I work with computers everyday and find these tools useful. Perlbox Voice is not intended to be a
fill-every-voice-need kind of tool, but more like an assistant. I am a more productive programmer/user if I can keep my hands on the keyboard and not the mouse. I make much better use of desktop features when I can just say the command and forget the key strokes. If I have several aplications (word processors, text editors, browser) open on multiple desktops, I can move between them with commands like 'desktop next' and "desktop two". I dont have to remember funky keystrokes or use my mouse. If I want to open a new app, I don't have to go looking through menus or for icons, I just say the name that I have assigned to this app ('music->xmms', 'movie->totem', 'web->firefox' etc). No, I think that this Perlbox would make anyone more productive.

#

Re:Tts... and voice command

Posted by: Administrator on January 24, 2005 10:52 PM
Explain to me where i said it would make anyone less productive? You may have misinterpreted what i said. In fact i am praising the development of these programs, And hoping to see them furthered. I know about as much about programming as mickey mouse does about vintage cigars. Very little if anything at all... I am working on fixing that though. I never meant to say that Perlbox voice was intended to make your machine completely hands free. Or to fill with every command available. In fact i do not believe i ever referenced the application at all. My post was stictly based on the idea's that were stired by the topic in question. So in the end of this i agree with your usage of the application and think what you are doing is pretty sweet... But i also think that in the years comming we can do better, and eventually eliminate the keyboard all together.

#

Re:Tts... and voice command

Posted by: Anonymous Coward on January 25, 2005 02:25 AM
I am so sorry, I meant to reply to the message below this one where it is referrred to as a 'programmers toy'. I will reply to your post here:

I am a programmer for an R&D firm out of Silicon Valley that has been around since the 40's. I often work on AI projects and my CS degree emphasised AI. You are 100% correct, we are moving towards real autonomy, but the problems we face are very real (and huge). See, we can make systems that mimic intelligence, but the problem is evident when we consider the first roadblock: how do you have a conversation with your computer when it cant even understand your spoken words? How can your computer seem really intelligent when it can not even recognize you by appearance? Image recognition (feature extraction) and Voice recognition have a long way to go.

We are making intelligent agents, but when their input systems are dumb, they seem stupid. You have highlighted a problem here, one that I hope will become less of an issue as time goes on.

#

Re:Tts... and voice command

Posted by: Administrator on January 25, 2005 05:20 AM
We have already created small steps in true AI, we have software that will take a type of known web page and avoid it based on a single reason. There are many things in which we have already accomplished, But making a machine possess and Active learning processor, that is the goal and we must all reach for it. It is not playing god, nor is it un ethical, It would mean a comfort and one more thing to do many of the things we as individuals don't want to do.. mow the lawn, take out the trash change the oil that is 2 months over due for it. And the way technology is going, everything is only getting cheaper. It is my opinion that if my projections hold true we have another 7 to 12 years before we start seeing the first true AI proto-types. Simple math, small tasks etc. on demand , and willing and ready to lend a hand.

#

Tts... and voice command

Posted by: Administrator on January 20, 2005 11:55 PM
One must stop and think about progression... You will hear not chaos theory's from me about computers responding on their own free will... But with tts and VC progress our systems will hopefully be able to understand and react without having to program and label everything out. I for one think it would be nice to say... run IRC , server Undernet, join channel warezsouth and have my computer react and respond on the fly. without so much as a second wasted in interpretation. And without me having to tie those words into text and vice versa, for my computer to recognise it. Tts and VC is wonderful... But true AI is a good goal we should all aspire to contribute to achieving any way we can. It is not laziness, it is true perfection, flawless in it's idea.
But for those of you that scream abomination and the end of thee world theory's and the rest of that drivle. With owning and operating a computer you have a responsibility. It is both a personal AND ethical one. If you are even posting or reading onthis site then you undoubtly can make windows roll over and die and leave the user gawking at the screen going... what happened? So you.. Unix, Linux, BSD, users.. your smarter than that. While the kiddies are twiddling with the latest screen saver they downloaded from some pop up in windows. You are busy doing something that actually matters. And i respect all of you for your input in the world community. But calling AI wrong? Or even so much as disturbing. Ethical Practice and use. If anyone knows of anyone working on an AI project please feel free to reply and let me know...

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya