Darkstar is a comprehensive network data resource that warehouses, consolidates, and normalizes massive amounts of data, liberating it from the systems that created it and making network data easily accessible from a single place for whoever needs it. Network operators can use the data to reconstruct past topologies or paths for troubleshooting; data miners can search for patterns and trends; planners can use it to estimate future capacity and QoS requirements; visualization experts can experiment with new visual representations; and researchers have data for testing algorithms and tools.

Data comes from across AT&T networks (IP/MPLS enterprise and consumer technologies, IPTV, and mobility data services) and is collected from many devices and network management systems; it includes SNMP metrics, log files, router configuration, syslogs, and more. Over 170 feeds a day provide data, totaling close to 340 million records a day. More feeds are being added.

As data is fed in, loading tools normalize device names, convert times to a common format (GMT), and standardize differing measurements (Kbytes per second vs Mbytes per minute) and reporting intervals (hourly or daily reports vs SNMP data reported every five minutes).

Darkstar is built on top of the Daytona database, which stores the data in tables organized for network management tasks. Building and populating the tables is overseen by DataDepot, which tracks where the data comes from, what tables it needs to populate, what operations need to be done (if new information is being created from data from different sources), and ensures that new data replaces old and in the right order.

Polling data, normalizing it, verifying it, and populating table cells are all done in real time on streaming data, making the data almost instantly available. DarkStar is thus a warehouse with both real-time and historical information.

It's also a platform and architecture for building tools to access and analyze data.

With data already normalized and aligned in time, tools are easier to build since they themselves don’t have the overhead of normalizing the data. Two data-access tools that were built quickly include RouterMiner, which retrieves information from all routers between two devices, and PathMiner, which displays all events (and information about devices) along a path. These two tools, both web-based, collect requested data within seconds when it previously took weeks for operators to manually collect the same data from individual devices.

Statistical and correlation tools include:

NICE (Network-Wide Information Correlation and Exploration) to better understand the signature of certain events, especially those that are intermittent and hard to track manually; Nice aggregates all instances of a particular event (dropped calls, router reboots, and router flaps) to see what other events co-occur in a statistically significant manner within the same local topology.

Mercury, which looks for wholesale changes by comparing router syslogs before and after an event (router upgrades, for example).

G-RCA, which automatically diagnoses the most likely cause of a given symptom; G-RCA is essentially an expert system that encompasses operator knowledge.

Other tools are being built and planned.


Project Members

Jennifer Yates

Joseph Seidel

Related Projects

Project Space

AT&T Application Resource Optimizer (ARO) - For energy-efficient apps

Assistive Technology

CHI Scan (Computer Human Interaction Scan)

CoCITe – Coordinating Changes in Text

Connecting Your World


E4SS - ECharts for SIP Servlets

Scalable Ad Hoc Wireless Geocast

AT&T 3D Lab

Graphviz System for Network Visualization

Information Visualization Research - Prototypes and Systems

Swift - Visualization of Communication Services at Scale

Smart Grid

Speech Mashup

Omni Channel Analytics

Speech translation

StratoSIP: SIP at a Very High Level


Content Augmenting Media (CAM)

Content-Based Copy Detection

Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)

Content Analytics - distill content into visual and statistical representations

MIRACLE and the Content Analysis Engine (CAE)

Social TV - View and Contribute to Public Opinions about Your Content Live

Visual API - Visual Intelligence for your Applications

Enhanced Indexing and Representation with Vision-Based Biometrics

Visual Semantics for Intuitive Mid-Level Representations

eClips - Personalized Content Clip Retrieval and Delivery

iMIRACLE - Content Retrieval on Mobile Devices with Speech

AT&T WATSON (SM) Speech Technologies

Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization