Darkstar is a comprehensive network data resource that warehouses, consolidates, and normalizes massive amounts of data, liberating it from the systems that created it and making network data easily accessible from a single place for whoever needs it. Network operators can use the data to reconstruct past topologies or paths for troubleshooting; data miners can search for patterns and trends; planners can use it to estimate future capacity and QoS requirements; visualization experts can experiment with new visual representations; and researchers have data for testing algorithms and tools.
Data comes from across AT&T networks (IP/MPLS enterprise and consumer technologies, IPTV, and mobility data services) and is collected from many devices and network management systems; it includes SNMP metrics, log files, router configuration, syslogs, and more. Over 170 feeds a day provide data, totaling close to 340 million records a day. More feeds are being added.
As data is fed in, loading tools normalize device names, convert times to a common format (GMT), and standardize differing measurements (Kbytes per second vs Mbytes per minute) and reporting intervals (hourly or daily reports vs SNMP data reported every five minutes).
Darkstar is built on top of the Daytona database, which stores the data in tables organized for network management tasks. Building and populating the tables is overseen by DataDepot, which tracks where the data comes from, what tables it needs to populate, what operations need to be done (if new information is being created from data from different sources), and ensures that new data replaces old and in the right order.
Polling data, normalizing it, verifying it, and populating table cells are all done in real time on streaming data, making the data almost instantly available. DarkStar is thus a warehouse with both real-time and historical information.
It's also a platform and architecture for building tools to access and analyze data.
With data already normalized and aligned in time, tools are easier to build since they themselves don’t have the overhead of normalizing the data. Two data-access tools that were built quickly include RouterMiner, which retrieves information from all routers between two devices, and PathMiner, which displays all events (and information about devices) along a path. These two tools, both web-based, collect requested data within seconds when it previously took weeks for operators to manually collect the same data from individual devices.
Statistical and correlation tools include:
NICE (Network-Wide Information Correlation and Exploration) to better understand the signature of certain events, especially those that are intermittent and hard to track manually; Nice aggregates all instances of a particular event (dropped calls, router reboots, and router flaps) to see what other events co-occur in a statistically significant manner within the same local topology.
Mercury, which looks for wholesale changes by comparing router syslogs before and after an event (router upgrades, for example).
G-RCA, which automatically diagnoses the most likely cause of a given symptom; G-RCA is essentially an expert system that encompasses operator knowledge.
Other tools are being built and planned.