AT&T

 

Labs image
AT&T Labs - Research
Statistics
Research

Graph Mining

Home
People
Research
Awards
Employment
Contact

Graph mining refers to extracting knowledge from massive graphs. The data sets of telephone calls we see at AT&T can be viewed as a single graph, with several hundred million phone numbers as nodes, and calls between phone numbers as edges. It is a giant social network, like an internet connections graph or a rich citation network.

Specifically we are interested in "Dynamic graphs", where the edges are transactions with a time stamp, and the graph gains and loses edges through time. To deal with dynamic graphs we have designed methodology to define a "calling circle" for every node, which is automatically updated daily through an exponentially weighted moving average.

Our framework stores the massive graph (with hundreds of millions of nodes and edges) in an indexed database where the unit of analysis is a signature, which can be a calling circle, for every node. The indexed database can extract calling circles for any number on the network quickly and accurately. By iterating this step we can get very rich subgraphs of the entire network very efficiently, and study statistical properties of individual behavior and of dynamic graph changes. Our methodology is general and has been applied to email and web networks as well. When applied to telecommunications data, these signatures are used for:

  • subscription fraud -- new accounts with many fraudulent numbers in their calling circle are suspicious and generate alerts
  • targeted (viral) marketing -- allows us to find clusters of customers who have high probability of taking a given product offer.
  • repetitive debtors - delinquent customers who try and set up a new account are identified by their calling patterns and the new account can be shut down.
In addition, this work was the springboard for the development of a novel method of calculating proximities in a social network using a random walk analogy. We generate subgraphs called "proximity graphs" which extract the most interesting neighborhood containing a few specified nodes. This work was published at KDD 2006 and we have put up a demo page to demonstrate proximity graphs on the IMDB and DBLP databases.

To learn more, contact Chris Volinsky.