The Large-Vocabulary Speech Recognition engine provides full speech transcription capability on small embedded devices as well as servers. The speech recognition engine supports a configurable lexicon, in the order of 300,000+ unique words, with the capability of customizing a language model to your domain of interest. The engine comes with a generic language model which covers most idiosyncrasies of the language, as default. Medical, legal, financial, and other language domains are also available. In addition, new and unique domains may be defined and trained in a matter of hours.

Once utilized in conjunction with our Speaker Recognition engine, it provides full diarization capabilities, where the Speaker Recognition engine segments speakers and labels their identities and the Speech Recognition engine transcribes the text that is spoken by each individual. These engines work together and provide timestamps and other details such as score and confidence for each result. They also provide multiple possible results sorted according to their relevance scores. These results may be returned in XML, JSON, or even clean human readable Text and HTML. We provide a C++ API as well as web, Android, iOS, and command-line interfaces.

Supported Languages

^®

English

All dialects of English

Spanish

Major dialects of Spanish

Chinese (Mandarin)

Major dialects of Mandarin Chinese

Arabic

Major dialects of Arabic

German

Major dialects of German

Multi-Lingual Support for 100+ Languages

Includes Code-Switching