RecoMadeEasy Embedded AudioVisual Recognition Engine by Recognition Technologies, Inc.
  • AudioVisual Recognition
    (Combination of Speaker, Speech, Face Recognition, and Object Detection and Recognition with a single interface)
    Server Based
    Server-Based AudioVisual Recognition


    RecoMadeEasy® Embedded AudioVisual Recognition is an embedded natural language voice and video recognition engine that offers comprehensive conversational voice interaction, voice biometrics and facial recognition. The engine has a small memory footprint and is designed to run natively on devices that seek unconstrained natural language interfaces with high recognition accuracy in the presence of service interruption or when full, uninterrupted and secure access to a cloud server is not guaranteed.

    The RecoMadeEasy® AudioVisual Recognition engine is comprised of three distinct technologies: Speaker, Speech, and Facial Recognition, which have been developed in our research labs in New York. When presented with an audio, video, or audio-video stream, the engine via the API returns the following in either XML or JSON:

    1. Speaker Segmentation of Incoming Audio, Video, or Both (including timestamps of the location where the speakers change and tagging of each audio, video, or combined segment with the ID of the person speaking in that segment)
    2. Standalone engine which may be used through a very simple C++ SDK and API. This would be most useful for integrating the engine into current products and IVR systems.
    3. Audio and/or Visual Identification of speaker(s)
    4. Audio and/or Visual Verification of speaker(s)
    5. Full Transcription of the audio stream

    The engine is built to allow users to speak naturally and be understood – even in a far-field, noisy environment. RecoMadeEasy® is available as an SDK with an included API that contains all necessary components for full integration and enables engineers to get started easily and without any work or costs for development.

    The RecoMadeEasy® AudioVisual Reocgnition engine is also available as a server-side and a standalone product.

    Speaker Recognition

      Language- and Text-Independence: The speaker recognition system is completely text- and language-independent. This means that a user may enroll her/his voice into the system in one language and be identified or verified in a completely different language. This allows the engine to be able to handle authentication and identification processes across any number of languages.

    Large-Vocabulary Speech Recognition

      The speech recognition side of the engine provides one of the most accurate transcriptions for English, handling many different dialects and accents in a single large-vocabulary transcription engine, It is also capable of providing real-time processing in a small memory footprint.

      The speech recognition uses a streaming interface where the recognizer, in the form of listeners and the client, both run on the embedded device. Any light generic client capable of using a websocket interface may stream audio/video to a listener and get back real-time results of the transcript with optional alternative results, including likelihood scores in any codec that is supported by GStreamer-1.0, including MP3, Ogg Vorbis, Free Lossless Audio Codec (FLAC), MP4, Pulse Code Modulation (PCM), or other codecs such as those supported by a standard Waveform Audio File Format (WAVE).

    Face Recognition

      The facial recognition side of the engine provides face detection, face identification (open-set and closed-set), and facial verification from still images and video streams. It supports all standard image and video formats such as png, jpeg, gif, mp2, mp4, .mov, etc.

    Supported Operating Systems

      The RecoMadeEasy® Embedded AudioVisual Recognigtion engine is available for the following operating systems. The C++ SDK, command-line interface, and web services may be used in any of the following systems:

    Server and Desktop Operating Systems (64-bit and 32-bit):

    • CentOS 7.3 Linux (Latest)
    • Previous CentOS Linux versions: 7.2, 7.1, 7.0, 6.6, 6.4, 6.3 6.2, 5.7, 5.6, 5.4

    • Fedora 31 Linux (Latest)
    • Previous Fedora Linux versions: 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, Core 5, Core 4, Core 3, Core 2, Core

    • Ubuntu 19.04, 18.04, and 16.04 Linux (Latest)

    • N.B.: May be made available for other Unix-Like systems upon request

  • Speaker Recognition
    (Language- and Text-Independent, aka: Speaker Biometrics, Voice Biometrics, or SIV)
    Recipient: Frost & Sullivan Award 2011
    Server Based

  • Large-Vocabulary Speech Recognition
    Available for English, Spanish, Mandarin, Arabic, and German
    Also Available in Bilingual Spanish-English, Mandarin-English, Arabic-English, and German-English
    (Customizable domain full transcription ~ 240,000+ word vocabulary)
    Server Based

  • Face Recognition
    (face detection and recognition)
    Server Based

  • Object Recognition
    (object detection and recognition)
    Server Based

  • Interactive Voice Response (IVR)
    (Graph-based logic, easily configured)
    Product Details

  • Automatic Language Proficiency Rating (ALPR)
    (Multi-lingual automated language proficiency rating)

  • Signature Recognition
    Status: Advanced Development Stage

  • Keystroke Recognition
    Status: Research Stage

For further information please contact us at 1-800-215-0841 inside the U.S. or +1-914-997-5676 from any other country. Alternatively, you may send an Email to Recognition Technologies, Inc.