
180 Park Ave - Building 103
Florham Park, NJ
http://www.research.att.com/~urbanek
Subject matter expert in Visualization, Interactive Graphics, R, Statistical Computing
Exploring the Use of Urban Greenspace through Cellular Network Activity
Ramon Caceres, James Rowland, Christopher Small, Simon Urbanek
2nd Workshop on Pervasive Urban Applications (PURBA),
2012.
[PDF]
[BIB]
Springer Copyright
The definitive version was published in 2012 , 2012-06-19
{Knowing when and where people use greenspace is key to our understanding of urban ecology. The number of cellular phones active in a geographic area can serve as a proxy for human density in that area. We are using anonymous records of cellular network activity to study the spatiotemporal patterns of human density in an urban area. This paper presents the vision and some early results of this effort. First, we describe our dataset of six months of activity in the New York metropolitan area. Second, we present a novel technique for estimating network coverage areas. Third, we describe our approach to analyzing changes in activity volumes within those areas. Finally, we present preliminary results regarding changes in human density around Central Park. From winter to summer, we find that density increases in greenspace areas and decreases in residential areas.}

Computational Television Advertising
Suhrid Balakrishnan, Sumit Chopra, David Applegate, Simon Urbanek
IEEE International Conference on Data Mining,
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in 2012. , 2012-12-12
{Ever wonder why that Kia Ad ran during Iron Chef?
While advertising on television is still a robust business, providing a fascinating mix of marketing, branding, predictive modeling and measurements, it is at risk with the recent emergence of online television. Traditional methods used to generate advertising
campaigns on television do not come close to the highly sophisticated computational techniques being used in the online world, in terms of efficiency. This paper is an attempt to recast the process of television advertising media campaign generation in a computational framework. We describe efficient mathematical approaches to solve for the task of finding optimal campaigns for specific target audiences. We highlight the efficacy of our proposed methods and compare them using two case studies against
campaigns generated by traditional methods. }

iPlots eXtreme - Next-generation Interactive Graphics
Simon Urbanek
DSC 2009 proceedings (special issue of Computational Statistics),
2011.
[PDF]
[BIB]
Springer Copyright
The definitive version was published in Computational Statistics. , 2011-04-01
{Interactive graphics provide a very important tool that facilitates the process of exploratory data and model analysis which is a crucial step in real-world applied statistics. Only a very limited set of software exists that provides truly interactive graphics for data analysis, partially because it is not easy to implement. Very often specialized software is created to offer graphics for a particular problem, but many fundamental plots are omitted since it is not considered new research. In this paper we discuss a general framework that allows to create interactive graphics software on a sound foundation that offers consistent user interface, fast prototyping of new plots and extensibility to support interactive models.
In addition, we also discuss one implementation of the general framework: iPlots eXtreme - next-generation interactive graphics for analysis of large data in R. It provides most fundamental plot types and allows new interactive plots to be created. The implementation raises interactive graphics performance to an entirely new level. We will discuss briefly several methods that allowed us to achieve this goal and illustrate the use of advanced programmability features in conjunction with R.}

Unsupervised Clustering of Multidimensional Distributions using Earth Mover Distance
Simon Urbanek, Tamraparni Dasu, Shankar Krishnan, David Applegate
ACM KDD,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 20XX. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM KDD , 2011-08-01.
{Multidimensional distributions are often used in data min- ing to describe and summarize different features of large datasets. It is natural to look for distinct classes in such datasets by clustering the data. A common approach entails the use of methods like k-means clustering. However, the k-means method inherently relies on the Euclidean metric in the embedded space and does not account for additional topology underlying the distribution.
In this paper, we propose using Earth Mover Distance (EMD) to compare multidimensional distributions. For a n-bin histogram, the EMD is based on a solution to the transportation problem with time complexity O(n3 log n). To mitigate the high computational cost of EMD, we pro- pose an approximation that reduces the cost to linear time.
Other notions of distances such as the information theo- retic Kullback-Leibler divergence and statistical χ2 distance, account only for the correspondence between bins with the same index, and do not use information across bins, and are sensitive to bin size. A cross-bin distance measure like EMD is not affected by binning differences and meaningfully matches the perceptual notion of “nearness”.
Our technique is simple, efficient and practical for clus- tering distributions. We demonstrate the use of EMD on a practical application of clustering over 400,000 anonymous mobility usage patterns which are defined as distributions over a manifold. EMD allows us to represent inherent re- lationships in this space. We show that EMD allows us to successfully cluster even sparse signatures and we compare the results with other clustering methods. Given the large size of our dataset a fast approximation is crucial for this application.}

Route Classification using Cellular Handoff Patterns
Christopher Volinsky, Alexander Varshavsky, Richard Becker, Ji Loh, Simon Urbanek, Ramon Caceres, Karrie Hanson
13th ACM International Conference on Ubiquitous Computing,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in 13th ACM International Conference on Ubiquitous Computing , 2011-09-01.
{Understanding utilization of city roads is important for urban planners. In this paper, we show how to use cellular hand- off patterns from cellular phone networks to identify which routes people take through a city. Specifically, this paper makes the following three contributions. First, we show that cellular handoff patterns on a given route are stable across a range of conditions and propose a way to measure stability within and between routes using a variant of Earth Mover�s Distance. Second, we present two accurate classification al- gorithms for matching cellular handoff patterns to routes: one requires test drives on the routes while the other uses signal strength data collected by high-resolution scanners. Finally, we present an application of our algorithms for mea- suring relative volumes of traffic on routes leading into and out of a specific city, and validate our methods using statis- tics published by a state transportation authority.}

Clustering Anonymized Mobile Call Detail Records to Find Usage Groups
Christopher Volinsky, Richard Becker, Ramon Caceres, Karrie Hanson, Ji Loh, Simon Urbanek, Alexander Varshavsky
1st Workshop on Pervasive Urban Applications (PURBA),
2011.
[PDF]
[BIB]
Springer Copyright
The definitive version was published in PURBA-2011. , 2011-06-12
{Understanding the mix of different types of people in a city is an important input into urban planning. In this paper we identify distinct sectors of a population by their cellular phone usage. In a study of a small suburban city in New Jersey, we use unsupervised clustering to identify the usage patterns of heavy users . We uncover 7 unique usage patterns. We interpret two of the patterns as belonging to commuters and students, and verify these interpretations with deeper analysis of temporal and spatial patterns. }
A Tale of One City: Using Cellular Network Data for Urban Planning
Richard Becker, Ramon Caceres, Karrie Hanson, Ji Loh, Simon Urbanek, Alexander Varshavsky, Christopher Volinsky
IEEE Pervasive Computing ,
2010.
[PDF]
[BIB]
IEEE Copyright
The definitive version was published in IEEE Pervasive Computing , 2010-04-01, URL: https://ecopyright.ieee.org/ECTT/login.jsp Username: SCHPCSI-2011-01-0005 Password: 1295115660850
{The rapid growth of modern cities leaves urban planners faced with numerous challenges, such as high congestion and pollution levels. Effectively solving these challenges re- quires a deep understanding of existing city dynamics. In this paper, we describe methodology to study and monitor these dynamics by using Call Detail Records (CDRs), rou- tinely collected by wireless service providers as part of run- ning their networks. Our methodology scales to an entire population, has little additional cost, and can be continually updated. This provides an unprecedented opportunity to study and monitor cities in a way that current practices are not able to do.}