Why no Open Source Dictation applications?

Ken MacLean writes “Why are there no Open Source Dictation applications on Linux?

BackGround

Speech Recognition Engines require two types of files to recognize speech:

1.Acoustic Model: which is created by taking a Speech Corpus (which is a very large number of audio recordings of speech and their transcriptions) and ‘compiling’ it into statistical representations of the sounds that make up each word; and

2.Language Model or Grammar file: A Language Model is a large file containing the probabilities of certain sequences of words. A Grammar is a much smaller file containing sets of predefined combinations of words. Language Models are used for dictation applications, whereas Grammars are used in Desktop Command and Control or Telephony IVR (Interactive Voice Response) applications. When we talk IVR, I know what you are thinking – it’s for those annoying automated attendants … but think of the possibilities for Open Source personal or enterprise speech-enabled applications.

Problems with Current Approaches:

1.Acoustic Models used by Open Source Speech Recognition Engines are not interchangeable;
2.Free Open Source Speech Corpora for use in the creation of Acoustic Models have restrictive licenses – they are only for research or educational purposes;
3.Open Source Acoustic Models need to be improved – we are not there yet, we need more ‘source’ – i.e. more transcribed speech audio;
4.No Open Source Dictation Software – at least no software distributed with Acoustic Models that are good enough for Dictation applications.

VoxForge Approach:

VoxForge hopes to address these problems by creating a free, GPL based, Speech Corpus repository of transcribed speech audio files. We are essentially creating a user-submitted repository of the ‘source’ speech audio for the Acoustic Models used by Speech Recognition Engines. The Speech Audio files will then be ‘compiled’ into Acoustic Models for use with Open Source Speech Recognition engines such CAVS HTK, Julius and Sphinx.

The current focus of VoxForge is on collecting transcribed audio for use in creating Acoustic Models for use with speech-based Command and Control applications on a Personal Computer and for Voice-over-IP Speech IVR applications. When there is enough speech audio, VoxForge will work to create Acoustic and Language Models for Dictation Applications.

Why GPL?

Unrestricted, BSD style, Licenses for Speech Corpora for Open Source Speech Recognition Engines will not be effective – there is not enough of a user base. A GPL style license will ensure that user contributions will always benefit the Open Source Community, since it requires any distribution of derivative Acoustic Models to include access to the source audio used to create the Acoustic Model.”

RELATED ARTICLESMORE FROM AUTHOR

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

Advancing Xen on RISC-V: key updates

RELATED ARTICLES MORE FROM AUTHOR