Big Data

From Text to Threat Detection


Big Data Threats

Today’s globalized threat landscape demands that government agencies understand the conversations that are happening around them – especially as if they were in their native language. This level of proficiency requires more than just a machine or human translator. However, most government organizations lack the manpower to manage these tools, nor do they have the sufficient number of foreign language professionals who can properly analyze spoken and text-based conversations within their context. This means that agencies can miss security-relevant information that could provide valuable insight into attacks against our nation and more. Simply put: if you can’t analyze foreign languages quickly and efficiently, you can go blind to threats.

Handling the Data Deluge

Given the sheer amount of data and the complexity of analysis, it’s not unusual for intelligence agencies to only review about 10% of their data. This leaves the remaining 90% untouched, where potentially important insight is unanalyzed and lost. Since many agencies lack professionals suited to translate foreign languages, prioritizing what does get human attention is imperative. The ability to automate the triage of all this information – from long form news articles, to SMS conversations, to longs strings of tweets – can help agencies determine which information is most important to analyze. By triaging their data, organizations can ensure that human resources are looking at the most critical data while automated tools explore the remaining 90%.

Maintaining Context

Adding to the data and foreign language professional problem is a contextual issue as well. Every language has unique nuances and cultural references that are often lost when translated by non-native speakers and machine translators; because translators and machine solutions typically translate out of the foreign language first and then into the organization’s native tongue, they lose idiomatic expressions, semantic traits, and more in the process. For agencies to effectively identify, understand, and respond to foreign threats, they need to move away from machine translation and drive towards text analysis within the native language before translation. Analyzing text first within its original context helps to maintain expressions and contextual-information so the real messages translate properly and effectively.

Text Analysis for Better Threat Detection: Boston Bombings

When applied to the real world, data triage and text contextualization can lead to monumental security enhancements and greater public safety. In the case of the Boston Bombings, more advanced text and data analysis could have been key to stopping the attacks before they happened. One of the brothers was on a national security watch list, yet was still admitted into the U.S. twice because of text and data analysis issues. The first time, he arrived with over 100 other people who matched watch list names; there wasn’t enough time or manpower to interview everyone to determine if they were actual threats – it was a case of too many false positives. The second time, his name was missed because of an alternative spelling – a false negative.

More advanced, context-rich and prioritized text analysis could be used to prevent future threats like the Tsarnaev brothers; the practice of approximate string matching, analyzing and searching for spelling variations of a name, can be fine-tuned to hone in on the data organizations need to better defend our borders and keep people safe. Even more, triaging the information that’s most important, or in this case, the names that are most security-relevant on a flight manifest, can help agents prioritize what they should analyze and who they should talk to.

Advanced Text Analysis with Rosette

Rosette, a text-analysis solution from Basis Technology, empowers agencies to analyze and get the most value from their data by triaging what information is security-relevant. By prioritizing data, text analysis and translation can become more focused, helping trained staff to work with real actionable intelligence. Rosette can analyze and understand the full context of messages in 55 languages by first performing text analysis in the native language before translating into another tongue. This maintains contextual information that’s often lost by other machine translators. Together, the context-rich information and prioritized analysis strategy helps national security agencies keep a finger on the pulse of the current worldwide threat landscape.

For information on how Basis Technology helps national security agencies analyze and prioritize foreign threats, check out these resources.

Related Articles