I’ve griped for years about Roku’s retro one-dimensional menu. Finally, in conjunction with the release of the Roku 3 model, the company is giving the Linux-based media streaming player a 2D facelift, making it quicker and easier to access favorite channels and find new ones. In addition to the new two-dimensional menu system, the Roku […]
Inside Palaver: Linux Speech Recognition that Taps Google’s Voice Technology
Despite efforts to advance Linux speech recognition, there is still no reliable, fully-baked open source competitor to Dragon Systems’ proprietary Naturally Speaking. Lately, however, instead of trying to mimic Dragon’s technology, which is only available to Linux users via Wine emulation, some developers are cueing off simpler, yet in many ways more widely useful mobile natural language engines. Last week, a De Anza College student named James McClain released a public beta of an open source GNU/Linux speech recognition program called Palaver that uses Google’s voice APIs on the back end.
Palaver is billed as being easy to use, good at interpreting different pronunciations, and customizable, letting developers add commands and functions via an app dictionary. Available initially for Ubuntu, Palaver is designed primarily for controlling computer functions, but it can also be used for transcription.
Fuzzy search enables multiple search terms for a single task. For example, users can run, launch, or open a program with voice commands. Palaver can also respond to a few basic open-ended questions, such as speaking “NBA scores” to bring up results, but much more is planned along these lines.
“Most people using speech recognition today are using Siri or Google Now on their phone,” said McClain in an email interview with Linux.com. “In the past, they were almost certainly using Dragon, and Linux developers tried to imitate that. Palaver is much more similar to Siri or Android Voice Actions, which is what most people are looking for.”
The GPLv3-licensed Palaver supports swapping out Google’s technology for other back-end engines, said McClain. “Voice Actions is very accurate and fast, but many people understandably don’t want to give their information to Google,” he says. “Luckily, Palaver could hook up to something else. The code that calls Voice Actions is very simple and separated, so if someone wants to use an engine like PocketSphinx, nothing else has to change.”
McClain’s biggest challenge in developing Palaver lies in starting and stopping voice recording. “This actually kept me from writing the application for a long time,” he said. “Then someone said ‘Just make them press a hotkey to start and end speech; you can work on automatically stopping later.’ And so I did.”
More Linux Distro Support Coming
Despite some complaints about the hotkey requirement, the overall response to the private beta released late last year was quite positive. The beta testers even revealed a solution to the start/stop problem: an open source Google app called Vox-launcher. The ability to start recording speech without a hotkey is now slated for an upcoming release.
A release due next week will “rewrite some core parts,” and “allow Palaver to be installed more easily from the Software Center,” says McClain. The eventual goal is to let Palaver be easily installed on any supporting Linux distribution. The community has pitched in with offers to translate dictionaries and help package Palaver for particular distros, such as an already completed Arch Linux version.
Future releases will include an improved package manager and a repository for adding and removing functions. And McClain is looking for help in developing a configuration and installation GUI. In the meantime, a YouTube tutorial helps ease the setup process.
In addition to disposing of the hotkey, other planned features include improved debugging, support for more languages, and a feature that lets users create macros and bind them to speech commands. McClain hopes to greatly improve support for open-ended questions by connecting to natural-language knowledge systems. “Palaver can talk to Wolfram Alpha and MIT START directly via a web request or an API, so I plan to have them answer ‘What, How, Who’ questions,” says McClain.
McClain is also interested in crowdsourcing dictionary development by letting people suggest new commands and actions. “People would vote on what commands and actions they want, and developers would implement them,” he explains. “With enough people helping, combined with fuzzy recognition, it might be possible to say what you want done in natural language, without having to remember commands.”
VoxForge: a New Foundation for Linux Speech
Speech recognition is still a work in progress, and it continually fails to meet expectations. “Since no humans speak exactly the same, speech recognition is really hard,” says McClain. Meanwhile, Linux has trailed here, due largely to the usual market-share reasons.
Beyond Dragon’s Naturally Speaking, proprietary solutions are pretty much limited to a few Linux-compatible programs such as SRI International’s DynaSpeak and Vocapia’s VoxScribe. As for the paucity of ready-to-roll, fully featured open source efforts, McClain notes that most speech databases for training recognition engines have been proprietary. “Luckily we now have VoxForge,” he adds.
The VoxForge project aims to collect and compile transcribed speech to develop a standard set of acoustic models that can be shared by open source speech recognition engines (SREs). McClain notes, however, that “it will take a while for VoxForge to match the databases that Dragon or Google have.”
Meanwhile, the models the open SREs use now are, in the words of the VoxForge website, “not at the level of quality of commercial speech recognition engines.” Initially, VoxForge is supporting four open SREs: CMU Sphinx, Julius, HTK, and ISIP. Developed at Carnegie Mellon University, Sphinx appears to have drawn the most support, especially for CMU’s embedded-oriented PocketSphinx. The Japanese-focused Julius, meanwhile, is expanding into English-language applications. The Hidden Markov Model Toolkit (HTK) and the Internet-Accessible Speech Recognition Technology Project (ISIP) are both academic, research-oriented projects.
The lack of robust databases may explain why many of the open source Linux speech programs listed on Wikipedia, and the more up-to-date Arch Linux wiki seem to have lost momentum. Some newer efforts include the PocketSphinx-based GnomeVoiceControl and Simon, which was based on Julius and HTK, but recently switched to Sphinx in a 0.4 version that also added some experimental VoxForge models.
Canonical’s HUD project for Ubuntu and the emerging, mobile-oriented Ubuntu Touch, which McClain says he will eventually support, uses PocketSphinx and Julius. Last month HUD developer Ted Gould posted a blog entry saying Julius offers better performance and results, but has an irksome “4-clause BSD license, putting it in multiverse and making it so that we can’t link to it in the Ubuntu archive version of HUD.” Gould seems to be open for another solution.
Eventually, VoxForge should rise to the occasion, and in the meantime, innovative efforts like Palaver are reimagining the user experience. Fortunately, speech recognition has “improved a great amount recently,” says McClain. “Maybe we are finally hitting the needed processing power and technologies to develop fast, accurate, untrained, speech recognition.”
How To Install Ubuntu Voice Recognition is part of the Linux Foundation’s 100 Linux Tutorials Campaign. For more Linux how-to videos or to upload your own go to http://video.linux.com/categories/100-linux-tutorials-campaign.
Automotive Infotainment Gains TIZEN Rich-Media Support
PathPartner Technology has joined the GENIVI Alliance with an eye toward marketing its embedded multimedia software and design services to developers of next-generation automotive in-vehicle infotainment (IVI) devices. The GENIVI Alliance is a non-profit consortium with the goal of “bringing open source software into the car, starting with the most complicated car software system, […]
What is Open Source Cloud?
Editor’s Note: This is a guest post from Joe Brockmeier, community evangelist for CloudStack at Citrix.
For all the talk about cloud, it might come as a surprise to many in the industry that “cloud” is not a well-understood term. It’s often perceived as “just a buzzword” or something without a lot of substance. While the term can be abused, it’s actually an important concept and it’s certainly not just a passing fad.
In talking to people following the Apache CloudStack graduation, and meeting with the local Linux User Group (LUG), it dawned on me that cloud still bears some explanation. Let’s take a look the standard definition, some types of clouds, and why it matters.
NIST Definition (And Then Some…)
The National Institute of Standards and Technology (NIST) has a pretty good definition of cloud computing, which breaks down into five basic characteristics:
-
On-demand self-service – this means that users can provision their own services without requiring any interaction with another person. You can plunk down a credit card with Amazon, Dropbox, Contegix, or any number of cloud providers and start using the service almost immediately.
-
Broad network access – the cloud’s functions are available over the network.
-
Resource pooling – compute, storage, network, etc., are all pooled so that multiple users may make use of the service. Users don’t need to know the details about the resources they’re being assigned.
-
Rapid elasticity – the resources can scale (up or down) rapidly in response to demand. Resources usually appear unlimited (or close to it) to the end users.
-
Measured service – users can see how much of the resource they’re using, and are usually billed accordingly. Providers can tell exactly how much services users have consumed, and can bill (or for private clouds, chargeback) accordingly.
I also add one other item to the definition, which is sort of indicated in the Broad network access bit, but not explicitly:
- API – if the service doesn’t expose an API, it’s not really “cloudy.” You should be able to access a service programatically. Especially true if you’re talking about an open cloud service.
Types of Cloud
There are several types of cloud services that are common (you’ll find other Thing-as-a-Service types, but these are the dominant three):
-
Software-as-a-Service (SaaS) – things like Dropbox, Google Docs, Salesforce.com, or ownCloud are SaaS. Software that provide network/Web-based applications or services. Pretty much everything is abstracted away from the user here: they don’t need to know what OS the application is running on, nor how many resources are allocated to it. The user doesn’t have to handle upgrading software, or worry about underlying dependencies.
-
Platform-as-a-Service (PaaS) – a PaaS is a service or stack that takes care of the infrastructure, middleware, and orchestration to allow developers to focus on creating an application. Basically, it abstracts away the infrastructure layer so developers can create an application in their favorite language/framework, without getting bogged down in deployment details like the underlying operating system.
Examples of a PaaS: Google AppEngine, Engine Yard, or if you want an open source version, OpenShift.
- Infrastructure-as-a-Service (IaaS) – finally, that brings us to the IaaS layer. Users can provision compute, storage, and network resources but the underlying details are still abstracted away. So, for example, you can spin up an “instance” using CloudStack or Amazon EC2 with the equivalent of 2 Xeon CPUs at 2.0GHz, 4GB of RAM, and 100GB of storage and a public IP address.
But you don’t have to worry about which server that resides on, what the underlying hypervisor is, how to provision the IP address on the switches, etc.
Examples of IaaS include Apache CloudStack, Eucalyptus, OpenStack on the open cloud side, or Amazon Web Services EC2, and Google Compute on the non-open side.
How It Works
If you’re using an IaaS, you really don’t need to know how it works – that’s the beauty of it. But if you’re thinking about deploying one, it helps to know how they work and what you’re talking about.
It may be easiest to think of IaaS cloud as a sort of meta-OS. If you think about Linux, it manages all the resources of your server, desktop, laptop, or mobile device so that you can run applications on top of the hardware. It’s in charge of the network, storage, processor, etc.
An IaaS is like that, but at scale. It’s telling the individual hypervisors, network components, storage devices or servers what to do so that they don’t have to be managed manually.
It sounds like it should be amazingly complex – and an IaaS can be non-trivial to set up – but it’s not as complex as you might think.
If you take, for example, Apache CloudStack – you have an application that runs on one or more master servers and communicates with the hypervisors, storage, and network devices. It provides an interface via an API or Web-based UI that admins and users can interact with to manage resources. Instead of having to shell into a server and provision it directly, a user or admin can request specific resources and CloudStack will take care of the rest.
Why It Matters
This is extremely powerful when operating at scale and/or in an environment where it’s necessary to manage resources programatically (think a test/dev environment, for example) and where it’s necessary to allow users to provision their own resources on demand, isolate resources from other users, and to avoid having to give admin privileges to too many people.
This is the scale that Linux and open systems have made possible, and is necessary when running some of today’s organizations and applications. Organizations today need, in many cases, to manage hundreds or thousands (or tens of thousands) of servers with applications that are spread out over many, many individual VMs or servers.
Developers need to be able to write applications that can be spread over tens or thousands of servers as demand requires, rather than trying to “scale up” applications on bigger and beefier hardware.
Having an open cloud matters because we need to be able to continue the work that GNU and Linux folks have been doing for more than twenty years, at scale. It matters because we need the cloud to be bigger than Amazon or proprietary companies – and because users and organizations should have as much control over their computing destiny at scale as they have had on individual servers.
So, though many folks are probably tired about hearing about “cloud this” and “cloud that”, it’s really not going away anytime soon. And if you’re interested in software freedom, this is the next generation.
Leap Motion Support Comes To Linux
The Leap Motion device with its motion sensing technology is now supported by Linux…
Red Hat and Rackspace Face Down a Patent Troll
Red Hat and Rackspace Hosting have announced that they have won the dismissal of a patent suit by Uniloc USA. Uniloc was asserting patent #5,892,697, which relates to the handling of floating-point numbers. “In dismissing the case, Chief Judge Leonard Davis found that Uniloc’s claim was unpatentable under Supreme Court case law that prohibits the patenting of mathematical algorithms. This is the first reported instance in which the Eastern District of Texas has granted an early motion to dismiss finding a patent invalid because it claimed unpatentable subject matter.“
Univa Grid Engine Steps Up to Intel Xeon Phi with Version 8.1.4
Today Univa released the latest version of Univa Grid Engine. With cross-platform support, Release 8.1.4. of Univa Grid Engine includes a number of customer-driven enhancements:
- Improved Load collection tool for Intel Xeon Phi coprocessors
- Extended memory usage metrics for Multi-Threaded applications
- Scheduler performance enhancements ensuring maximum number of jobs running in the cluster while improving system responsiveness
- Interactive Univa Grid Engine jobs can now set their memory affinity
How to Run Android Apps on Your Windows or Mac Machine
Have you got some favorite smartphone apps? Not convinced by Microsoft’s new Windows app selection? Itching to see some Android action on your MacBook Pro? Don’t worry, just install an Android emulator on your Windows or Mac machine and run all of the Android apps that you’ve grown to love. A version is even available for Windows 8 Surface tablets. Load up to 750,000 Android apps, including games, SMS text messaging, and media apps. The free product that lets you do this is BlueStacks App Player, and it claims more than 5 million downloads.
Bull Opens Centre for Excellence in Parallel Programming
Bull has inaugurated its Centre for Excellence in Parallel Programming, described as the leading European centre of technical and industrial excellence in this field. By working with technology leaders, this centre aims to support engineers and scientists in research centres and industry to overcome the critical technological barrier of “HPC application parallelisation.”
Bull says the Centre will be equipped with the highest levels of expertise, to help research labs and companies optimise their applications so they can be compatible with not only processors available today, but also those in development for the next generation. It will supply a broad portfolio of services, including analysis and consultancy, as well as software parallelisation and optimisation.