
180 Park Ave - Building 103
Florham Park, NJ
Improved Approximation Algorithms for Prize-Collecting Steiner Tree and TSP
Aaron Archer, MohammadHossein Bateni, Mohammad Hajiaghayi, Howard Karloff
SIAM Journal on Computing,
SIAM Journal on Computing,
v40,
#2,
pp 309-332,
2011.
[PDF]
[BIB]
We study the prize-collecting Steiner tree (PCST), prize-collecting traveling salesman (PCTSP), and prize-collecting path (PC-Path) problems. Given a graph (V,E) with a cost on each edge and a penalty (a.k.a. prize) on each node, the goal is to find a tree (for PCST), cycle (for PCTSP), or path (for PC-Path) that minimizes the sum of the edge costs in the tree/cycle/path and the penalties of the nodes not spanned by it. In addition to being a useful theoretical tool for helping to solve other optimization problems, PCST has been applied fruitfully by AT&T to the optimization of real-world telecommunications networks. The most recent improvements for the first two problems, a 2-approximation algorithm for each, appeared first in 1992; a 2-approximation for PC-Path appeared in 2003. The natural linear programming (LP) relaxation of PCST has an integrality gap of 2, which has been a barrier to further improvements for this problem.
We present (2-eps)-approximation algorithms for all three problems, connected by a unified technique for improving prize-collecting algorithms that allows us to circumvent the integrality gap barrier. Specifically, our approximation ratio for prize-collecting Steiner tree is below 1.9672.

Combining Predictors for Recommending Music: the False Positives' approach to KDD Cup track 2
Suhrid Balakrishnan, Rensheng Wang, Carlos Scheidegger, Angus Maclellan, Yifan Hu, Aaron Archer, David Applegate, Shankar Krishnan, Guang Ma, Siu Au
KDD CUP 2011 workshop.,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in KDD CUP 2011 workshop , 2011-08-21.
{We describe our solution for the KDD Cup 2011 track 2 challenge. Our solution relies heavily on ensembling together diverse individual models for the prediction task, and achieved a final leaderboard misclassification rate of 3.8863\%. This paper provides details on both the modeling and ensemble creation steps.}
Voting power and target-based site prioritization
Steven Phillips, Aaron Archer, David Applegate, David Johnson, Robert Pressey, Desmond Torkornoo, Matthew Watts
Biological Conservation,
2010.
[PDF]
[BIB]
Elsevier Copyright
The definitive version was published in Biological Conservation (Elsevier). , Volume 143, 2010-07-01
{Indices for site prioritization are widely 2 used to address the
question: which sites are most important for conservation of
biodiversity? We investigate the theoretical underpinnings of
target-based prioritization, which measures sites' contribution to
achieving predetermined conservation targets. We show a strong
connection between site prioritization and the mathematical theory of
voting power. Current site prioritization indices are afflicted by
well-known paradoxes of voting power: a site can have zero priority
despite having non-zero habitat (the paradox of dummies) and discovery
of habitat in a new site can raise the priority of existing sites (the
paradox of new members). These paradoxes arise because of the razor's
edge nature of voting, and therefore we seek a new index that is not
strictly based on voting. By negating such paradoxes, we develop a set
of intuitive axioms that an index should obey. We introduce a simple
new index, "fraction-of-spare," that satisfies all the axioms. For
ingle-species site prioritization, the fraction-of-spare(s) of a site
s equals zero if s has no habitat for the species and one if s is
essential for meeting the target area for the species. In-between
those limits it is linearly interpolated, and equals area(s) / (total
area - target). In an evaluation involving multi-year scheduling of
site acquisitions for conservation of forest types in New South Wales
under specified clearing rates, fraction-of-spare outperforms 58
existing prioritization indices. We also compute the optimal schedule
of acquisitions for each of three evaluation measures (under the
assumed clearing rates) using integer programming, which indicates
that there is still potential for improvement in site prioritization
for conservation scheduling.
}

Optimal Content Placement for a Large-Scale VoD System
David Applegate, Aaron Archer, Vijay Gopalakrishnan, Seungjoon Lee, Kadangode Ramakrishnan
Proceedings of ACM CoNext 2010,
Submitted to ACM CoNext 2010,
2010.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM CoNext 2010 , 2010-12-01
IPTV service providers offering Video-on-Demand (VoD) typically have many servers at each metropolitan office to store all the videos in the library. With the rapid increase in the VoD library size, it will soon become infeasible to replicate the entire library at each office. We present an approach for intelligent content placement that scales to large VoD library sizes (e.g., 100Ks of videos). We formulate the problem as a mixed integer program (MIP) that takes into account constraints
such as disk space, link bandwidth, and the skew in content popularity. To overcome the challenges of scale, we employ a Lagrangian relaxation-based decomposition technique that can find a near-optimal solution (e.g., within 1-2%) with orders of magnitude speedup, relative to solving even the LP relaxation via standard software. We also present simple strategies to address practical issues such as popularity estimation, content updates, short-term popularity fluctuation, and frequency
of placement updates. Using traces from an operational system, we show that our approach significantly outperforms simpler placement strategies. For instance, our MIP-based solution can serve all requests using only half the link bandwidth used by LRU cache replacement policy. We also investigate the trade-off between disk space and network bandwidth.