Intelligence Solutions Carahsoft
Intelligence Solutions
Intelligence Events
Intelligence Contracts
Intelligence White Papers
Intelligence Resources
Contact Us
Contact Us
      


INTELLIGENCE SOLUTIONS FOR GOVERNMENT FROM CARAHSOFT TM

Basis Technology

Foreign Language Document and Media Exploitation

Company Overview

Basis Technology is the leading provider of desktop applications and developer tools for exploiting foreign language documents and digital media. At the core of this technology is Rosette, a powerful platform for natural language processing which has been under development for over ten years and which presently supports eighteen languages, including Arabic, Chinese, Farsi, and Korean.

The Rosette developer tools enable foreign language capabilities to be quickly added to large, complex systems, such as search engines, document repositories, and name matchers. Among the many such systems currently built on Rosette are Google's search engine, Microsoft's Windows Live Search, and the National Harmony document repository.

Basis Technology's desktop applications enable intelligence analysts to quickly and accurately translate and link Arabic documents, bringing together information from multiple sources, including web pages, maps, name lists, and databases. This suite has been designed to provide features needed by expert linguists while still being accessible to individuals with no foreign language skills.

Technology Summary

The Rosette Linguistics Platform enables text mining, document triage, and information retrieval applications to move beyond English into foreign languages with better-than-human degrees of accuracy and precision. Unlike "machine translation" systems, which are notorious for faulty grammar and semantic corruption, Rosette employs such techniques as morphological analysis, orthographic analysis, named-entity extraction, and named-entity translation to carefully preserve the meaning of the original text while maximizing intelligibility to linguist and non-linguist alike.

How It Works

Rosette analyzes large volumes of "unstructured text", i.e. free-form text from any digital source, such as web pages, hard drives, e-mail, online chat, and so on. Each of the Rosette components are responsible for one of the essential functions in the chain of processing:

  • Automatically identifying the language and character encoding of ingested text. Over 45 languages and over 85 character encodings can be automatically recognized.


  • Converting or "transcoding" text from over 150 legacy encodings into the industry-standard Unicode format.


  • Analyzing the Unicode text to identify key words, noun phrases, sentence boundaries, and "entities", i.e. names of people, names of places, dates, e-mail addresses, telephone numbers, credit card numbers, and so on.


  • Translating the "named entities" from foreign language into English with better-than-human (above 95%) accuracy
User Benefits
  • Highly-accurate processing of foreign language intelligence
  • Improve search engine precision and recall
  • Automatic translation and transliteration of name databases
  • Boost productivity of translators and linguists
  • Enable non-linguists to search, browse, and triage foreign documents

Resources:

      
    Adapx

    Agent Logic

    Aladdin

    Anonymizer

    Appistry

    Asankya

    Aster Data

    Basis Technology

    Bomgar

    CallMiner

    Centrifuge Systems

    Carbonetworks

    CipherOptics

    Corticon

    Decisive Analytics

    Endeca

    Forterra

    Fortisphere

    FortiusOne

    GlobalSCAPE

    Humintell

    IDELIX

    Immersion

    iMove

    Imperva

    Initiate

    IronKey

    ISC

    Jabber

    JackBe

    Jive Software

    Kapow

    Kortex

    Language Weaver

    Layer7

    Manhattan

    MapQuest

    MetaCarta

    MotionDSP

    Narus

    Ngrain

    NovoDynamics

    Oculis Labs

    Parabon Computation

    Passlogix

    piXlogic

    Oculis Labs

    Parabon Computation

    Q1 Labs

    Quantum4D

    Revere Data

    Saratoga Data Systems

    Savvion

    Seros

    SMSi Twister

    StreamBase

    TerraGo

    Thetus

    Trust Digital

    Utimaco

    Veracode

    Vormetric

    ZOS Communications