CoCITe – Coordinating Changes in Text

CoCITe_graphic

What's changing in a stream of text?  And why? Answering these questions may give an early or real-time indication of serious events.

And that is what CoCITe (Coordinating Changes In Text), a text mining tool, sets out to do. By analyzing words and other ASCII strings in text streams within text files, news items, emails, instant messages, and log files, CoCITe can model the expected word frequencies and thus discover changes that occur outside of regular patterns. Statistically significant events are then flagged.  For example, the graphic illustrates a burst-event in which the frequency of a word increases far beyond the regular daily pattern, so this is detected and an alert can be generated.

Changes can include the sudden appearance or disappearance of a word or subtle changes in its frequency. Because CoCITe is a mining not a search tool, changes are discovered spontaneously; it’s not necessary to predict the type or timing of changes.

Multiple events occurring at the same time, especially for certain values of the metadata, are coordinated and grouped together, providing more context and information that can help locate the source of the events.

Possible applications include mining log files for IP address anomalies that can signal malicious network activity (worms, viruses, botnet attacks). In emergencies, real-time field reports can be monitored to immediately learn what services are needed and where. 

Results can be viewed graphically, and recent changes can be summarized into a file. 

Technical Documents

CoCITe - Coordinating Changes In Text
Jeremy Wright, John Grothendieck
2009.  [PDF]  [BIB]

Documents (presentations, white papers)
CoCITe_20091030    CoCITe_20091030.pdf (530k)
CoCITe_20100204    CoCITe_20100204.pdf (434k)


Project Members

Jeremy Wright

Alicia Abella

Related Projects

Project Space

AT&T Application Resource Optimizer (ARO) - For energy-efficient apps

Assistive Technology

CHI Scan (Computer Human Interaction Scan)

Client Communications Center

CollaboraTV

Connecting Your World

Darkstar

Daytona

E4SS - ECharts for SIP Servlets

Scalable Ad Hoc Wireless Geocast

AT&T 3D Lab

Graphviz System for Network Visualization

Information Visualization Research - Prototypes and Systems

Swift - Visualization of Communication Services at Scale

AT&T Natural VoicesTM Text-to-Speech

Smart Grid

Speech Mashup

Speech translation

StratoSIP: SIP at a Very High Level

Telehealth

Content Augmenting Media (CAM)

Content-Based Copy Detection

Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)

MIRACLE and the Content Analysis Engine (CAE)

Social TV - View and Contribute to Public Opinions about Your Content Live

Visual API - Visual Intelligence for your Applications

Enhanced Indexing and Representation with Vision-Based Biometrics

Visual Semantics for Intuitive Mid-Level Representations

eClips - Personalized Content Clip Retrieval and Delivery

iMIRACLE - Content Retrieval on Mobile Devices with Speech

AT&T WATSON (SM) Speech Technologies

Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization