
CoCITe – Coordinating Changes in Text
What's changing in a stream of text? And why? Answering these questions may give an early or real-time indication of serious events.
And that is what CoCITe (Coordinating Changes In Text), a text mining tool, sets out to do. By analyzing words and other ASCII strings in text streams within text files, news items, emails, instant messages, and log files, CoCITe can model the expected word frequencies and thus discover changes that occur outside of regular patterns. Statistically significant events are then flagged. For example, the graphic illustrates a burst-event in which the frequency of a word increases far beyond the regular daily pattern, so this is detected and an alert can be generated.
Changes can include the sudden appearance or disappearance of a word or subtle changes in its frequency. Because CoCITe is a mining not a search tool, changes are discovered spontaneously; it’s not necessary to predict the type or timing of changes.
Multiple events occurring at the same time, especially for certain values of the metadata, are coordinated and grouped together, providing more context and information that can help locate the source of the events.
Possible applications include mining log files for IP address anomalies that can signal malicious network activity (worms, viruses, botnet attacks). In emergencies, real-time field reports can be monitored to immediately learn what services are needed and where.
Results can be viewed graphically, and recent changes can be summarized into a file.
Technical Documents
CoCITe - Coordinating Changes In Text
Jeremy Wright, John Grothendieck
2009.
[PDF]
[BIB]
Documents (presentations, white papers)
CoCITe_20091030
CoCITe_20091030.pdf (530k)
CoCITe_20100204
CoCITe_20100204.pdf (434k)
Project Members
Related Projects
AT&T Application Resource Optimizer (ARO) - For energy-efficient apps
CHI Scan (Computer Human Interaction Scan)
E4SS - ECharts for SIP Servlets
Scalable Ad Hoc Wireless Geocast
Graphviz System for Network Visualization
Information Visualization Research - Prototypes and Systems
Swift - Visualization of Communication Services at Scale
AT&T Natural VoicesTM Text-to-Speech
StratoSIP: SIP at a Very High Level
Content Augmenting Media (CAM)
Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)
MIRACLE and the Content Analysis Engine (CAE)
Social TV - View and Contribute to Public Opinions about Your Content Live
Enhanced Indexing and Representation with Vision-Based Biometrics
Visual Semantics for Intuitive Mid-Level Representations
eClips - Personalized Content Clip Retrieval and Delivery
iMIRACLE - Content Retrieval on Mobile Devices with Speech
AT&T WATSON (SM) Speech Technologies
Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization