What's changing in a stream of text? And why? Answering these questions may give an early or real-time indication of serious events.
And that is what CoCITe (Coordinating Changes In Text), a text mining tool, sets out to do. By analyzing words and other ASCII strings in text streams within text files, news items, emails, instant messages, and log files, CoCITe can model the expected word frequencies and thus discover changes that occur outside of regular patterns. Statistically significant events are then flagged. For example, the graphic illustrates a burst-event in which the frequency of a word increases far beyond the regular daily pattern, so this is detected and an alert can be generated.
Changes can include the sudden appearance or disappearance of a word or subtle changes in its frequency. Because CoCITe is a mining not a search tool, changes are discovered spontaneously; it’s not necessary to predict the type or timing of changes.
Multiple events occurring at the same time, especially for certain values of the metadata, are coordinated and grouped together, providing more context and information that can help locate the source of the events.
Possible applications include mining log files for IP address anomalies that can signal malicious network activity (worms, viruses, botnet attacks). In emergencies, real-time field reports can be monitored to immediately learn what services are needed and where.
Results can be viewed graphically, and recent changes can be summarized into a file.