-
AudioVisual Recognition
(Combination of Speaker, Speech, Face Recognition, and Object Detection and Recognition with a single interface)
-
Speaker Recognition
(Language- and Text-Independent, aka: Speaker Biometrics, Voice Biometrics, or SIV)
Recipient: Frost & Sullivan Award 2011
-
Large-Vocabulary Speech Recognition
Available for English, Spanish, Mandarin, Arabic, and German
Also Available in Bilingual Spanish-English, Mandarin-English, Arabic-English, and German-English
(Customizable domain full transcription ~ 240,000+ word vocabulary)
RecoMadeEasy®
Server-Based Large-Vocabulary Speech Recognition
Platform:
RecoMadeEasy® Large-Vocabulary Speech Recognition is a standalone natural language speech recognition engine that offers comprehensive conversational voice interaction through many different mechanisms including websockets, C++ API, and a web interface. The engine has a small memory footprint and is designed to run natively on devices that seek unconstrained natural language interfaces with high recognition accuracy in the presence of service interruption or when full, uninterrupted and secure access to a cloud server is not guaranteed.
The RecoMadeEasy® Speech Recognition engine has been developed in our research labs in New York. When presented with an audio or audio-video stream, the engine via the API returns JSON or XML results containinng the full transcript of the stream with a configurable number of contending results. The results include a likelihood score as well as a confidence score for each result. The engine also returns the timestamps of turns of audio (sentences). It is also capable of returning the timestamps for the words in the transcript, allowing for alignment and manipulation of the results alongside the original audio stream.
The engine is built to allow users to speak naturally and be
understood – even in a far-field, noisy
environment. RecoMadeEasy® is available as an SDK with an
included API that contains all necessary components for full
integration and enables engineers to get started easily and
without any work or costs for development.
The RecoMadeEasy® AudioVisual Speech Recognition engine is also available as a server-side and a standalone product.
This engine provides one of the
most accurate transcriptions for English, handling many
different dialects and accents in a single large-vocabulary
transcription engine, It is also capable of providing real-time
processing in a small memory footprint.
The speech recognition uses a streaming interface where the
recognizer, in the form of a TCP/IP listeners, runs on
any device. Any light generic client capable of using a
websocket interface may stream audio/video to a listener and get
back real-time results of the transcript with optional
alternative results, including likelihood scores in any codec
that is supported by GStreamer-1.0, including MP3, Ogg Vorbis,
Free Lossless Audio Codec (FLAC), MP4, Pulse Code Modulation
(PCM), or other codecs such as those supported by a standard
Waveform Audio File Format (WAVE).
Supported Languages
The RecoMadeEasy® Speech Recognition engine
is currently available for the following languages:
English
- All dialects of English
- Supports 8kHz and 16kHz Audio
Spanish
- Major dialects of Spanish
- Supports 8kHz and 16kHz Audio
Chinese (Mandarin)
- Major dialects of Mandarin Chinese
- Supports 8kHz and 16kHz Audio
Arabic
- Major dialects of Arabic
- Supports 8kHz and 16kHz Audio
German
- Major dialects of German
- Supports 8kHz and 16kHz Audio
Supported Operating Systems
The RecoMadeEasy® Speech Recognition engine
is available for the following operating
systems. The C++ SDK, command-line interface, and web
services may be used in any of the following systems:
Linux (both 32-bit and 64-bit versions are supported)
- CentOS 7.5 Linux (Latest)
- Previous CentOS Linux versions: 7.4, 7.3, 7.2, 7.1, 7.0, 6.6, 6.4, 6.3
6.2, 5.7, 5.6, 5.4
- Fedora 31 Linux (Latest)
- Previous Fedora Linux versions: 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16,
15, 14, 13, 12, 11, 10, 9, 8, 7, 6, Core 5, Core 4, Core 3, Core 2,
Core
- Ubuntu 19.04, 18.04, and 16.04 Linux (Latest)
- N.B.: May be made available for other Unix-Like systems upon request
-
Face Recognition
(face detection and recognition)
-
Object Recognition
(object detection and recognition)
-
Interactive Voice Response (IVR)
(Graph-based logic, easily configured)
-
Automatic Language Proficiency Rating (ALPR)
(Multi-lingual automated language proficiency rating)
-
Signature Recognition
Status: Advanced Development Stage
-
Keystroke Recognition
Status: Research Stage
|