
180 Park Ave - Building 103
Florham Park, NJ
Subject matter expert in graph visualization, visualization systems, applied algorithms
Stephen is Executive Director of Information Visualization Research. He works on systems and algorithms for visualizing and interactively exploring large, complex structures; and the general problem of applying computational geometry to visualization of abstract information with the goal of approaching the quality of hand-made graphics. Stephen is one of the authors of the Graphviz system. His group also created the core software for Vizgems, a software platform run by AT&T for its internal operations and enterprise customers. Vizgems collects, analyzes and displays near-realtime information at large scale for numerous managed services, and currently has about 100,000 endpoint devices. Stephen received a Ph.D. in Computer Science from Princeton University in 1986.
Best Paper, 20th International Symposium on Graph Drawing, 2012.
For "Visualizing Streaming Text Data with Dynamic Maps"
AT&T Fellow, 2007.
Information visualization: Honored for leading research in visualization of information at large scale.
Visualizing Streaming Text Data with Dynamic Graphs and Maps
Emden Gansner, Yifan Hu, Stephen North
20th International Symposium of Graph Drawing,
2012.
[PDF]
[BIB]
Springer Copyright
The definitive version was published in 2012. , 2012-09-01
{The many endless rivers of text now available present a serious challenge in
the task of gleaning, analyzing and discovering useful information.
In this paper, we describe a methodology for visualizing text streams in
real time. The approach automatically groups similar messages into ``countries,''
with keyword summaries, using semantic analysis, graph clustering and
map generation techniques. It handles the need for visual stability
across time by dynamic graph layout and Procrustes projection techniques,
enhanced with a novel stable component packing algorithm. The result provides
a continuous, succinct view of evolving topics of interest.
It can be used in passive mode for overviews and situational awareness,
or as an interactive data exploration tool.
To make these ideas concrete, we describe their application
to an online service called TwitterScope.
}

Maxent-Stress Model for Graph Layout
Yifan Hu, Emden Gansner, Stephen North
IEEE Transactions on Visualization and Computer Graphics.,
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in 2012 , 2012-09-30
{For embedding graphs or high dimensional data, stress model is widely
used when edges length, or distances between items, are
specified. However, traditional full stress model is not scalable, due
to the need for all-pairs shortest path calculation. A number of
fast approximation algorithms were proposed. While they work well for
some graphs, on graphs of intrinsic high dimensions, such as some
non-rigid graphs, the results are less satisfactory. In this paper we
propose a maxent-stress model. The method uses the principal of
maximal entropy to deal with the extra degrees of freedom.
We formulate a force-augmented stress
majorization algorithm to solve the maxent-stress model. Numerical
results show that the algorithm can scale to large
graphs, yet does not degrade on
non-rigid graphs. This also has potential applications to scalable algorithms for statistical multidimensional scaling (MDS) with variable distances.
}

Multilevel Agglomerative Edge Bundling for Visualizing Large Graphs
Emden Gansner, Yifan Hu, Stephen North, Carlos Scheidegger
Proceeds of the 4th IEEE Pacific Visualization Symposium,
2011.
[BIB]
{Graphs are often used to encapsulate relationships between
objects. Node-link diagrams, commonly used to visualize graphs, suffer
from visual clutter on large graphs. Edge bundling is an effective technique
for alleviating clutter and revealing high-level edge patterns.
Previous methods for general graph layouts either
require a control mesh to guide the bundling process, which can
introduce high variation in curvature along the bundles, or
all-to-all force and compatibility calculations, which is not
scalable.
We propose a multilevel agglomerative edge bundling method based on a
principled approach of minimizing ink needed to represent edges,
with additional constraints on the curvature of the resulting splines.
The proposed method is much faster than previous ones, able to bundle
hundreds of thousands of edges in seconds, and one million edges in
a few minutes.
}

LiveRAC: interactive visual exploration of system management time-series data
Stephen North, Eleftherios Koutsofios, Peter McLachlan, Tamara Munzner
CHI '08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems,
ACM,
pp 1483--1492,
2008.
[PDF]
[BIB]
We present LiveRAC, a visualization system that supports the analysis of large collections of system management time-series data consisting of hundreds of parameters across thousands of network devices. LiveRAC provides high information density using a reorderable matrix of charts, with semantic zooming adapting each chart's visual representation to the available space. LiveRAC allows side-by-side visual comparison of arbitrary groupings of devices and parameters at multiple levels of detail. A staged design and development process culminated in the deployment of LiveRAC in a production environment. We conducted an informal longitudinal evaluation of LiveRAC to better understand which proposed visualization techniques were most useful in the target environment.

Visual Analysis of Network Traffic for Resource Planning, Interactive Monitoring, and Interpretation of Security Threats
Florian Mansmann, Daniel A. Keim, Stephen C. North, Brian Rexroad, Daniel Sheleheda
IEEE Transactions on Visualization and Computer Graphics,
IEEE Computer Society,
v13,
#6,
pp 1105-1112,
2007.
[PDF]
[BIB]
The Internet has become a wild place: malicious code is spread on personal computers across the world, deploying botnets ready to attack the network infrastructure. The vast number of security incidents and other anomalies overwhelms attempts at manual analysis, especially when monitoring service provider backbone links. We present an approach to interactive visualization with a case study indicating that interactive visualization can be applied to gain more insight into these large data sets. We superimpose a hierarchy on IP address space, and study the suitability of Treemap variants for each hierarchy level. Because viewing the whole IP hierarchy at once is not practical for most tasks, we evaluate layout stability when eliding large parts of the hierarchy, while maintaining the visibility and ordering of the data of interest.

Medial-Axis-Based Cartograms
Daniel Keim, Stephen North, Christian Panse
IEEE Comput. Graph. Appl.,
IEEE Computer Society Press,
v25,
#3,
pp 60--68,
2005.
[PDF]
[BIB]
Cartograms are a well-known technique for showing geography-related statistical information, such as demographic and epidemiological data. The idea is to distort a map by resizing its regions according to a statistical parameter, but in a way that keeps the map recognizable. This article describes a method of continuous cartogram generation, which strictly retains the input map's topology. It presents an algorithm that makes cartograms by iterative relocation of the map's vertices, guided by a modified medial axes transformation. Application experiments show that the proposed algorithm can make high-quality cartograms in interactive time, even for large maps. Additional examples help to demonstrate its potential.
Pixel Based Visual Mining of Geo-Spatial Data
Daniel A. Keim, Christian Panse, Mike Sips, Stephen C. North
Computers & Graphics,
v28,
#3,
pp 327 - 344,
2004.
[PDF]
[BIB]
In many application domains, data is collected and referenced by geo-spatial location. Spatial data mining, or the discovery of interesting patterns in such databases, is an important capability in the development of database systems. A noteworthy trend is the increasing size of data sets in common use, such as records of business transactions, environmental data and census demographics. These data sets often contain millions of records, or even far more. This situation creates new challenges in coping with scale. For data mining of large data sets to be effective, it is also important to include humans in the data exploration process and combine their flexibility, creativity, and general knowledge with the enormous storage capacity and computational power of today.s computers. Visual data mining applies human visual perception to the exploration of large data sets. Presenting data in an interactive, graphical form often fosters new insights, encouraging the formation and validation of new hypotheses to the end of better problem-solving and gaining deeper domain knowledge. In this paper we give a short overview of visual data mining techniques, especially for analyzing geo-spatial data. We provide examples for effective visualizations of geo-spatial data in important application areas such as consumer analysis and census demographics.

CartoDraw: A fast algorithm for generating contiguous cartograms
Daniel Keim, Stephen North, Christian . Panse
IEEE Transactions on Visualization and Computer Graphics,
IEEE Educational Activities Department,
v10,
#1,
pp 95--110,
2004.
[PDF]
[BIB]
Cartograms are a well-known technique for showing geography-related statistical information, such as population demographics and epidemiological data. The basic idea is to distort a map by resizing its regions according to a statistical parameter, but in a way that keeps the map recognizable. In this study, we formally define a family of cartogram drawing problems. We show that even simple variants are unsolvable in the general case. Because the feasible variants are NP-complete, heuristics are needed to solve the problem. Previously proposed solutions suffer from problems with the quality of the generated drawings. For a cartogram to be recognizable, it is important to preserve the global shape or outline of the input map, a requirement that has been overlooked in the past. To address this, our objective function for cartogram drawing includes both global and local shape preservation. To measure the degree of shape preservation, we propose a shape similarity function, which is based on a Fourier transformation of the polygons’ curvatures. Also, our application is visualization of dynamic data, for which we need an algorithm that recalculates a cartogram in a few seconds. None of the previous algorithms provides adequate performance with an acceptable level of quality for this application. In this paper, we therefore propose an efficient iterative scanline algorithm to reposition edges while preserving local and global shapes. Scanlines may be generated automatically or entered interactively to guide the optimization process more closely. We apply our algorithm to several example data sets and provide a detailed comparison of the two variants of our algorithm and previous approaches.

Visualizing software for telecommunication services
Emden Gansner, John Mocenigo, Stephen North
SoftVis '03: Proceedings of the 2003 ACM symposium on Software visualization,
ACM,
pp 151--ff,
2003.
[PDF]
[BIB]
An active research area in telecommunications concerns how to specify and control the addition of new services, such as call waiting or instant messaging, into existing software. One approach is to rely on a component-based architecture such as Distributed Feature Composition (DFC), by which a new service can be specified as a composition of primitive features over time. Formally, a communication episode is represented by a dynamic graph of software feature boxes, called a usage. This serves as the fundamental model for how services are invoked and how they interact with other services.This paper, after providing some background on DFC, discusses a technique for visualizing the usages which arise through DFC specifications. With the visualization, users can monitor and validate service protocols and feature interactions in real time or through playback logs. The principal display component uses a novel variation of force-directed layouts for undirected graphs. The resulting graphical interface has become a principal tool for developers building services using DFC.
Visualizing software for telecommunication services
Emden Gansner, John Mocenigo, Stephen North
2003.
[BIB]
Visualization research with large displays
Bin Wei, Claudio Silva, Eleftherios Koutsofios, Shankar Krishnan, Stephen North
IEEE Comput. Graph. Appl.,
IEEE Computer Society Press,
v20,
#4,
pp 50--54,
2000.
[PDF]
[BIB]
We describe our research at the AT&T Infolab on using large displays to interactively analyze and visualize AT&T's communication networks and services.
AT&T AST OpenSource software collection
Glenn Fowler, David Korn, Stephen North, Kiem Vo
ATEC '00: Proceedings of the annual conference on USENIX Annual Technical Conference,
USENIX Association,
pp 45--45,
2000.
[PDF]
[BIB]
This paper introduces a large collection of reusable software components that AT&T is making available in an OpenSource form. This software has been widely used around the world and includes well-known components such as KornShell, Nmake, Graphviz, Sfio, Vmalloc and Cdt.
Visualizing Large-Scale Telecommunication Networks and Services
Eleftherios Koutsofios, Stephen North, Russell Truscott, Daniel Keim
VIS '99: Proceedings of the conference on Visualization '99,
IEEE Computer Society Press,
pp 457--461,
1999.
[PDF]
[BIB]
Visual exploration of massive data sets arising from telecommunication networks and services is a challenge. This paper describes SWIFT-3D, an integrated data visualization and exploration system created at AT&T Labs for large scale network analysis. SWIFT-3D integrates a collection of interactive tools that includes pixel-oriented 2D maps, interactive 3D maps, statistical displays, network topology diagrams and an interactive drill-down query interface. Example applications are described, demonstrating a successful application to analyze unexpected network events (high volumes of unanswered calls), and comparison of usage of an Internet service with voice network traffic and local access coverage.