ECI Blog @WordPress

Latest news from the ECI Networks Group

Software: Text Mining


Uncovering telltale patterns.

by Cade Metz

During a series of hearings last fall, the U.S. Senate Select Committee on Intelligence showed that prior to September 11, 2001, the American intelligence community had collected a significant amount of data about the men who attacked the World Trade Center and the Pentagon. The various intelligence agencies were simply unable to connect the dots. In his report, Richard C. Shelby, then vice chairman of the committee, stressed that agencies need powerful new tools to analyze the huge volumes of information they bring in.

Text-mining software is one of the front-line tools that the government is now using to tease out valuable connections. These specialized search engines can quickly sift through mountains of unstructured text—anything that’s not carefully arranged in a database or spreadsheet—and pull out the meaningful stuff. They can infer relationships within data that are not stated explicitly. It is something we do all the time automatically but is enormously complicated for computers. “We bridge the gap between information and action,” says Barak Pridor, CEO of ClearForest, a text-mining company.

The result of years of research at facilities such as Bell Labs and the Palo Alto Research Center, text-mining apps have long been used in business. But more government agencies, including the Defense Intelligence Agency, the Department of Homeland Security, and the FBI, are using them to evaluate the multitude of e-mail messages, phone call transcripts, memos, foreign news stories, and other pieces of intelligence data these agencies collect each day.

Software from companies such as Autonomy, ClearForest, and Inxight Software can locate words and phrases the same way an ordinary search engine does. But that’s just the beginning. Such applications are clever enough to run conceptual searches, locating, say, all the phone numbers and place names buried in a collection of intelligence communiqués. More impressive, the software can identify relationships, patterns, and trends involving words, phrases, numbers, and other data.

Using statistical and mathematical analysis, the programs can sift through thousands of documents and determine how certain words relate to each other. If a news story says that “Zacarias Moussaoui was a follower of the Islamic cleric Abu Qatada while living in London,” a text-mining app can identify Moussaoui and Qatada as people, identify London as a place, and determine the relationship among the three.

In theory, a human analyst could pick up those connections easily, but manually sifting through the enormous volumes of information is often impractical. Fortunately, text-mining applications don’t get tired.

Reblog this post [with Zemanta]

July 16, 2009 - Posted by | Streamin, Tech News | , , , , , , ,

Sorry, the comment form is closed at this time.