|
|
|
 INTELLIGENCE SOLUTIONS FOR GOVERNMENT FROM CARAHSOFT TM

Foreign Language Document and Media Exploitation
Company Overview
Basis Technology is the leading provider of desktop applications and developer tools for exploiting foreign language documents and digital media. At the core of this technology is Rosette, a powerful platform for natural language processing which has been under development for over ten years and which presently supports eighteen languages, including Arabic, Chinese, Farsi, and Korean.
The Rosette developer tools enable foreign language capabilities to be quickly added to large, complex systems, such as search engines, document repositories, and name matchers. Among the many such systems currently built on Rosette are Google's search engine, Microsoft's Windows Live Search, and the National Harmony document repository.
Basis Technology's desktop applications enable intelligence analysts to quickly and accurately translate and link Arabic documents, bringing together information from multiple sources, including web pages, maps, name lists, and databases. This suite has been designed to provide features needed by expert linguists while still being accessible to individuals with no foreign language skills.
Technology Summary
The Rosette Linguistics Platform enables text mining, document triage, and information retrieval applications to move beyond English into foreign languages with better-than-human degrees of accuracy and precision. Unlike "machine translation" systems, which are notorious for faulty grammar and semantic corruption, Rosette employs such techniques as morphological analysis, orthographic analysis, named-entity extraction, and named-entity translation to carefully preserve the meaning of the original text while maximizing intelligibility to linguist and non-linguist alike.
How It Works
Rosette analyzes large volumes of "unstructured text", i.e. free-form text from any digital source, such as web pages, hard drives, e-mail, online chat, and so on. Each of the Rosette components are responsible for one of the essential functions in the chain of processing:
- Automatically identifying the language and character encoding of ingested text. Over 45 languages and over 85 character encodings can be automatically recognized.
- Converting or "transcoding" text from over 150 legacy encodings into the industry-standard Unicode format.
- Analyzing the Unicode text to identify key words, noun phrases, sentence boundaries, and "entities", i.e. names of people, names of places, dates, e-mail addresses, telephone numbers, credit card numbers, and so on.
- Translating the "named entities" from foreign language into English with better-than-human (above 95%) accuracy
User Benefits
- Highly-accurate processing of foreign language intelligence
- Improve search engine precision and recall
- Automatic translation and transliteration of name databases
- Boost productivity of translators and linguists
- Enable non-linguists to search, browse, and triage foreign documents
Resources:
|
|

|