
180 Park Ave - Building 103
Florham Park, NJ
Why OSPF paths aren�t always shortest
David Applegate, Carsten Lund, Aman Shaikh
NANOG 54,
2012.
[LINK]
[BIB]
NANOG Copyright
(c) ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in NANOG 54 , 2012-02-06.
{Knowing what paths packets are taking within a network is crucial for several network management tasks. When the network is running OSPF, determining paths is supposedly straight-forward, since OSPF is basically a link-state protocol, and with link-state protocols, a packet follows the shortest path in terms of link weights from its source to the destination. However, OSPF allows a network to be divided into areas for scalability, and this makes it more than a link-state protocol. As a result, though packets still follow shortest paths within an area, this is not guaranteed to be the case across areas. In fact, paths followed by packets with areas can often be unexpected and non-intuitive. Some recent extensions to OSPF, namely multi-area adjacencies (RFC 5185) and alternative implementation of border routers (RFC 3509), have further exacerbated the situation. This presentation illustrates such cases with examples, while providing insights into why OSPF paths can look so strange with areas. }

Computational Television Advertising
Suhrid Balakrishnan, Sumit Chopra, David Applegate, Simon Urbanek
IEEE International Conference on Data Mining,
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in 2012. , 2012-12-12
{Ever wonder why that Kia Ad ran during Iron Chef?
While advertising on television is still a robust business, providing a fascinating mix of marketing, branding, predictive modeling and measurements, it is at risk with the recent emergence of online television. Traditional methods used to generate advertising
campaigns on television do not come close to the highly sophisticated computational techniques being used in the online world, in terms of efficiency. This paper is an attempt to recast the process of television advertising media campaign generation in a computational framework. We describe efficient mathematical approaches to solve for the task of finding optimal campaigns for specific target audiences. We highlight the efficacy of our proposed methods and compare them using two case studies against
campaigns generated by traditional methods. }

Unsupervised Clustering of Multidimensional Distributions using Earth Mover Distance
Simon Urbanek, Tamraparni Dasu, Shankar Krishnan, David Applegate
ACM KDD,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 20XX. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM KDD , 2011-08-01.
{Multidimensional distributions are often used in data min- ing to describe and summarize different features of large datasets. It is natural to look for distinct classes in such datasets by clustering the data. A common approach entails the use of methods like k-means clustering. However, the k-means method inherently relies on the Euclidean metric in the embedded space and does not account for additional topology underlying the distribution.
In this paper, we propose using Earth Mover Distance (EMD) to compare multidimensional distributions. For a n-bin histogram, the EMD is based on a solution to the transportation problem with time complexity O(n3 log n). To mitigate the high computational cost of EMD, we pro- pose an approximation that reduces the cost to linear time.
Other notions of distances such as the information theo- retic Kullback-Leibler divergence and statistical χ2 distance, account only for the correspondence between bins with the same index, and do not use information across bins, and are sensitive to bin size. A cross-bin distance measure like EMD is not affected by binning differences and meaningfully matches the perceptual notion of “nearness”.
Our technique is simple, efficient and practical for clus- tering distributions. We demonstrate the use of EMD on a practical application of clustering over 400,000 anonymous mobility usage patterns which are defined as distributions over a manifold. EMD allows us to represent inherent re- lationships in this space. We show that EMD allows us to successfully cluster even sparse signatures and we compare the results with other clustering methods. Given the large size of our dataset a fast approximation is crucial for this application.}

Combining Predictors for Recommending Music: the False Positives' approach to KDD Cup track 2
Suhrid Balakrishnan, Rensheng Wang, Carlos Scheidegger, Angus Maclellan, Yifan Hu, Aaron Archer, David Applegate, Shankar Krishnan, Guang Ma, Siu Au
KDD CUP 2011 workshop.,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in KDD CUP 2011 workshop , 2011-08-21.
{We describe our solution for the KDD Cup 2011 track 2 challenge. Our solution relies heavily on ensembling together diverse individual models for the prediction task, and achieved a final leaderboard misclassification rate of 3.8863\%. This paper provides details on both the modeling and ensemble creation steps.}
Voting power and target-based site prioritization
Steven Phillips, Aaron Archer, David Applegate, David Johnson, Robert Pressey, Desmond Torkornoo, Matthew Watts
Biological Conservation,
2010.
[PDF]
[BIB]
Elsevier Copyright
The definitive version was published in Biological Conservation (Elsevier). , Volume 143, 2010-07-01
{Indices for site prioritization are widely 2 used to address the
question: which sites are most important for conservation of
biodiversity? We investigate the theoretical underpinnings of
target-based prioritization, which measures sites' contribution to
achieving predetermined conservation targets. We show a strong
connection between site prioritization and the mathematical theory of
voting power. Current site prioritization indices are afflicted by
well-known paradoxes of voting power: a site can have zero priority
despite having non-zero habitat (the paradox of dummies) and discovery
of habitat in a new site can raise the priority of existing sites (the
paradox of new members). These paradoxes arise because of the razor's
edge nature of voting, and therefore we seek a new index that is not
strictly based on voting. By negating such paradoxes, we develop a set
of intuitive axioms that an index should obey. We introduce a simple
new index, "fraction-of-spare," that satisfies all the axioms. For
ingle-species site prioritization, the fraction-of-spare(s) of a site
s equals zero if s has no habitat for the species and one if s is
essential for meeting the target area for the species. In-between
those limits it is linearly interpolated, and equals area(s) / (total
area - target). In an evaluation involving multi-year scheduling of
site acquisitions for conservation of forest types in New South Wales
under specified clearing rates, fraction-of-spare outperforms 58
existing prioritization indices. We also compute the optimal schedule
of acquisitions for each of three evaluation measures (under the
assumed clearing rates) using integer programming, which indicates
that there is still potential for improvement in site prioritization
for conservation scheduling.
}

Optimal Content Placement for a Large-Scale VoD System
David Applegate, Aaron Archer, Vijay Gopalakrishnan, Seungjoon Lee, Kadangode Ramakrishnan
Proceedings of ACM CoNext 2010,
Submitted to ACM CoNext 2010,
2010.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM CoNext 2010 , 2010-12-01
IPTV service providers offering Video-on-Demand (VoD) typically have many servers at each metropolitan office to store all the videos in the library. With the rapid increase in the VoD library size, it will soon become infeasible to replicate the entire library at each office. We present an approach for intelligent content placement that scales to large VoD library sizes (e.g., 100Ks of videos). We formulate the problem as a mixed integer program (MIP) that takes into account constraints
such as disk space, link bandwidth, and the skew in content popularity. To overcome the challenges of scale, we employ a Lagrangian relaxation-based decomposition technique that can find a near-optimal solution (e.g., within 1-2%) with orders of magnitude speedup, relative to solving even the LP relaxation via standard software. We also present simple strategies to address practical issues such as popularity estimation, content updates, short-term popularity fluctuation, and frequency
of placement updates. Using traces from an operational system, we show that our approach significantly outperforms simpler placement strategies. For instance, our MIP-based solution can serve all requests using only half the link bandwidth used by LRU cache replacement policy. We also investigate the trade-off between disk space and network bandwidth.
Method For Web-Based Exploration Of Network Infrastructure,
Tue Nov 24 16:08:00 EST 2009
In accordance with the teachings of the present invention, a method of data drilling is presented. A network database is deployed in a network, such as the Internet, for broad-based user access. Network information is stored in the network database and is organized in layers. A graphical user interface with data objects is presented to an end user. Selecting a data object generates a query performed by a server. Each query produces more details on an initially selected data object.
Method and apparatus for efficient routing of variable traffic,
Tue Jun 17 18:12:54 EDT 2008
A method and apparatus for provide highly efficient traffic routing for a wide range of possible traffic matrices (TM) in an intra-domain network. That routing optimally balances the traffic loads over a range of traffic matrices so as to minimize the deviation for any particular traffic matrix from the optimal routing. Such a routing provides a guaranteed performance ratio against the best possible network routing. The invention utilizes a method of optimally configuring a traffic network based on solving a linear program to obtain the optimal routing, and then configuring the routing on the network accordingly.
AT&T Fellow, 2012.
For outstanding innovation and invention in the design and implementation of tools for visualizing, understanding, and optimizing AT&T’s core, access, and mobility networks.
Frederick W. Lanchester Prize, 2007.
IEEE Communications Society William R. Bennett Prize, 2007.
Beale-Orchard-Hays Prize, 2001.