
180 Park Ave - Building 103
Florham Park, NJ
To Cache or not to Cache: The 3G case
Jeffrey Erman, Alexandre Gerber, Mohammad Hajiaghayi, Dan Pei, Subhabrata Sen, Oliver Spatscheck
IEEE Internet Computing,
2011.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE Internet Computing, 2011. , 2011-01-01
{Cellular networks have witnessed tremendous traffic growth
recently, fueled by the rapid proliferation of smartphones,
laptops with mobile data cards, and new technologies improving
the performance of these networks. However, unlike
the wired world, there exists a rather limited understanding
of the application mixes and the characteristics of this traffic.
Recent studies have shown that in the wired broadband
world, HTTP traffic accounts for the vast majority of the application
traffic and that forward caching of HTTP objects
results in substantial savings in network resources. What
about cellular networks? The answer is a function of the
traffic characteristics, network architecture, as well as the
various cost points associated with delivering traffic in these
networks. In this paper, we examine the characteristics of
HTTP traffic generated by millions of users across one of
the world�s largest 3G cellular networks, and explore the potential
of forward caching. We provide a simple cost model
that third parties can easily use to determine the cost-benefit
tradeoffs for their own cellular network settings. This is the
first large scale caching analysis in cellular networks.}

Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses
Lukasz Golab, MohammadHossein Bateni, Mohammad Hajiaghayi, Howard Karloff
Theory of Computing Systems Journal,
2011.
[PDF]
[BIB]
Springer Science+Business Media Copyright
The definitive version was published in Theory of Computing Systems Journal. , 2011-07-01
{We study scheduling algorithms for loading data feeds into real time data
warehouses, which are used in applications such as IP network monitoring, online financial
trading, and credit card fraud detection. In these applications, the warehouse
collects a large number of streaming data feeds that are generated by external
sources and arrive asynchronously. Data for each table in the warehouse are generated at a
constant rate, different tables possibly at different rates.
For each data feed, the arrival of new data triggers an emph{update} that seeks to append
the new data to the corresponding table; if multiple updates are pending for the
same table, they are batched together before being loaded.
At time $ au$, if a table has been updated with information up to time $r le au$,
its emph{staleness} is defined as $ au - r$.
Our first objective is to schedule the updates on one or more processors in a way that
minimizes the total staleness.
In order to ensure fairness, our second objective is to limit the maximum ``stretch'',
which we define (roughly) as the
ratio between the duration of time an update waits till it is
finished being processed,
and the length of the update.
In contrast to earlier work proving the nonexistence of constant-competitive algorithms
for related scheduling problems, we prove that emph{any} online nonpreemptive algorithm,
no processor of which is ever voluntarily idle, incurs a staleness at most a constant
factor larger than an obvious lower bound on total staleness (provided that the processors
are sufficiently fast).
We give a constant-stretch algorithm,
provided that the processors are sufficiently fast,
for the emph{quasiperiodic model},
in which tables can be clustered into a few groups such that the update frequencies
within each group vary by at most a constant factor.
Finally, we show that our constant-stretch algorithm is also
constant-competitive (subject to the same proviso on processor speed)
in the quasiperiodic model with respect to total weighted staleness, where tables
are assigned weights that reflect their priorities.}

Improved Approximation Algorithms for Prize-Collecting Steiner Tree and TSP
Aaron Archer, MohammadHossein Bateni, Mohammad Hajiaghayi, Howard Karloff
SIAM Journal on Computing,
SIAM Journal on Computing,
v40,
#2,
pp 309-332,
2011.
[PDF]
[BIB]
We study the prize-collecting Steiner tree (PCST), prize-collecting traveling salesman (PCTSP), and prize-collecting path (PC-Path) problems. Given a graph (V,E) with a cost on each edge and a penalty (a.k.a. prize) on each node, the goal is to find a tree (for PCST), cycle (for PCTSP), or path (for PC-Path) that minimizes the sum of the edge costs in the tree/cycle/path and the penalties of the nodes not spanned by it. In addition to being a useful theoretical tool for helping to solve other optimization problems, PCST has been applied fruitfully by AT&T to the optimization of real-world telecommunications networks. The most recent improvements for the first two problems, a 2-approximation algorithm for each, appeared first in 1992; a 2-approximation for PC-Path appeared in 2003. The natural linear programming (LP) relaxation of PCST has an integrality gap of 2, which has been a barrier to further improvements for this problem.
We present (2-eps)-approximation algorithms for all three problems, connected by a unified technique for improving prize-collecting algorithms that allows us to circumvent the integrality gap barrier. Specifically, our approximation ratio for prize-collecting Steiner tree is below 1.9672.

Capacitated Metric Labeling
Howard Karloff, Mohammad Hajiaghayi, Matthew Andrews, Ankur Moitra
ACM-SIAM Symposium on Discrete Algorithms (SODA),
2011.
[PDF]
[BIB]
SIAM Copyright
The definitive version was published in ACM-SIAM Symposium on Discrete Algorithms (SODA) , 2011-01-23
{We introduce
Capacitated Metric Labeling. As in Metric Labeling, we
are given a weighted graph G = (V, E), a label set L, a
semimetric d_L on this label set, and an assignment cost
function phi: V x L -> R, and the goal, as
in Metric Labeling, is to find an assignment f: V->L
that minimizes a particular two-cost function,
but in Capacitated Metric Labeling with the additional restriction that each label t_i receive at most l_i nodes.
We give a O(log |V|)-approximation algorithm when the number of labels is
fixed. We prove that it is impossible to approximate the value
of an instance of Capacitated Metric Labeling to within
any finite factor, if P!=NP.
Furthermore, we prove that any finite-ratio polynomial-time
approximation algorithm must multiplicatively violate the label
capacities by Omega((log |L|)^(1/2-eps)).
}
The Submodular Secretary Problem
Mohammad Hajiaghayi, Mohammadhossein Bateni, Morteza Zadimoghaddam
2009.
[PDF]
[BIB]
On Euclidean Prize-collecting Steiner Forest Problems
Mohammad Hajiaghayi, Mohammadhossein Bateni
2009.
[PDF]
[BIB]
Network-Aware Forward caching
Jeffrey Erman, Alexandre Gerber, Mohammad Hajiaghayi, Dan Pei, Oliver Spatscheck
in Proc. World Wide Web Conference (WWW),
2009.
[PDF]
[BIB]
IW3C2 Copyright
"Copyright is held by the International World Wide Web Conference Committee (IW3C2)."
{This paper proposes and evaluates a Network Aware Forward
Caching approach for determining the optimal deployment strategy
of forward caches to a network. A key advantage of this approach
is that we can reduce the network costs associated with forward
caching to maximize the benefit obtained from their deployment.
We show in our simulation that a 37% increase to net benefits could
be achieved over the standard method of full cache deployment to
cache all POPs traffic. In addition, we show that this maximal point
occurs when only 68% of the total traffic is cached.
Another contribution of this paper is the analysis we use to motivate
and evaluate this problem. We characterize the Internet traffic
of 100K subscribers of a US residential broadband provider. We
use both layer 4 and layer 7 analysis to investigate the traffic volumes
of the flows as well as study the general characteristics of
the applications used. We show that HTTP is a dominant protocol
and account for 68% of the total downstream traffic and that 34%
of that traffic is multimedia. In addition, we show that multimedia
content using HTTP exhibits a 83% annualized growth rate and
other HTTP traffic has a 53% growth rate versus the 26% over all
annual growth rate of broadband traffic. This shows that HTTP
traffic will become ever more dominant and increase the potential
caching opportunities. Furthermore, we characterize the core
backbone traffic of this broadband provider to measure the distance
traveled by content and traffic. We find that CDN traffic is much
more efficient than P2P content and that there is large skew in the
Air Miles between POP in a typical network. Our findings show
that there are many opportunities in broadband provider networks
to optimize how traffic is delivered and cached.}

Multi-VPN Optimization for Scalable Routing via Relaying
MohammadHossein Bateni, Alexandre Gerber, Mohammad Hajiaghayi, Subhabrata Sen
in Proc. IEEE INFOCOM Mini-Conference and IEEE/ACM Transactions on Networking,
2009.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in Proc. IEEE INFOCOM Mini-Conference and IEEE/ACM Transactions on Networking, 2009. , 2009-04-19
{Enterprise networks are increasingly adopting
Layer 3 Multiprotocol Label Switching (MPLS) Virtual Private
Network (VPN) technology to connect geographically disparate
locations. The any-to-any direct connectivity model of this technology
involves a very high memory footprint and is causing
associated routing tables in the service provider�s routers to grow
very large. The concept of Relaying was proposed earlier [9]
to separately minimize the routing table memory footprint of
individual VPNs, and involves selecting a small number of hub
routers to maintain complete reachability information for that
VPN, and enabling non-hub spoke routers with reduced routing
tables to achieve any-to-any reachability by routing traffic via a
hub.
A large service provider network typically hosts many thousands
of different VPNs. In this paper, we generalize Relaying
to the multi-VPN environment, and consider new constraints on
resources shared across VPNs, such as router uplink bandwidth
and memory. The hub selection problem involves complex tradeoffs
along multiple dimensions including these shared resources,
and the additional distance traversed by traffic. We formulate the
hub selection as a constraint optimization problem and develop
an algorithm with provable guarantees to solve this NP-complete
problem. Evaluations using traces and configurations from a
large provider and many real-world VPNs indicate that the
resulting Relaying solution substantially reduces the total router
memory requirement by 85% while smoothing out the utilization
on each router and requiring only a small increase in the endto-
end path for the relayed traffic.}

Improved Approximation Algorithms For Label Cover Problems
Howard Karloff, Mohammad Hajiaghayi, Moses Charikar
European Symposium on Algorithms,
2009.
[PDF]
[BIB]
Springer-Verlag Copyright
The definitive version was published in European Symposium on Algorithms , 2009-09-07
{}
Network-Aware Forward caching
Alexandre Gerber, Jeffrey Erman, Mohammad Hajiaghayi, Dan Pei, Oliver Spatscheck
2008.
[PPT]
[BIB]
Assignment Problem in Content Distribution Networks: Unsplittable Hard-capacitated Facility Location
Mohammad Hajiaghayi, Mohammad Bateni
2008.
[PDF]
[BIB]
Ordinal Embedding: Approximation Algorithms and Dimensionality Reduction
Mohammad Hajiaghayi, Erik Demaine, Anastasios Sidiropoulos, Mihai B?adoiu, Morteza Zadimoghaddam
2007.
[PDF]
[BIB]
Minimizing Movement: Fixed-Parameter Tractability
Mohammad Hajiaghayi, Erik Demaine, Daneil Marx
2007.
[PDF]
[BIB]
System And Method For Assigning Requests In A Content Distribution Network,
Tue Nov 20 16:12:24 EST 2012
A system includes a plurality of edge routers and a route controller. The edge routers are configured to direct requests from a client system to one of a plurality of cache servers. Each of the cache servers is configured to provide content to the client system in response to the requests. The route controller is configured to receive demand information from the edge routers, estimate an optimal request distribution based on the demand information using a bicriteria approximation algorithm, and provide each of the edge routers with route information.
Designing Minimum Total Cost Networks Using Iterative Rounding Approximation Methods,
Tue Aug 07 16:11:20 EDT 2012
Minimum cost networks, such as fiber optic networks used in telecommunications, are obtained by defining available network elements having cost, required pairs, connectivity and penalty cost values and selecting from these available elements using an iterative rounding approximation method that constructs an LP relaxation incorporating the element parameters, finds an optimal basic solution, applies a selection criterion to pairs and edges in the optimal basic solution, and constructs a residual LP relaxation with selected pairs and edges. By fixing selected pairs and edges values to 1 in the residual LP, successive iterations of the method provide a design which is a 3-approximation solution to the minimum cost design problem.
Network Aware Forward Caching,
Tue Jan 24 16:09:06 EST 2012
An Internet service provider includes a cache server and a network aware server. The network aware server is operable to determine an optimization between a cost of retrieving content from a network and a cost of caching content from the network at the first cache server and then send a content identifier to the cache server. The cache server is operable to receive the content identifier, and determine the source of a content item. If the source is the same as the content identifier, then the cache server caches the content item.
Approximating Node-Weighted Steiner Network Of Terminals,
Tue Apr 26 16:04:58 EDT 2011
According to one method for approximating a network of terminals, a graph comprising nodes and edges connecting at least some of the nodes is received. The nodes include terminals and non-terminal nodes. The non-terminal nodes are each associated with a weight. The terminals are each initialized to a value. The values of the terminals are incremented by a given amount until the values of the terminals reach a sufficient amount to acquire at least one of the non-terminal nodes that connects at least two of the terminals based on the weight of the at least one of the non-terminal nodes. Upon the values of the terminals reaching the sufficient amount, the at least one of the non-terminal nodes and the edges connecting the at least one of the non-terminal nodes to the at least two of the terminals are acquired to form a connected component in the network of terminals.
Scalable Multiprotocol Label Switching Based Virtual Private Networks And Methods To Implement The Same,
Tue Sep 14 15:04:40 EDT 2010
Example scalable multi-protocol label switching (MPLS) based virtual private networks (VPNs) and methods to implement the same are disclosed. A disclosed example spoke provider edge (PE) router for an MPLS-based VPN includes a truncated virtual routing and forwarding (VRF) table containing a first value referencing a hub PE router and a second value referencing a first customer edge (CE) router coupled to the VPN via the PE router, and a forwarding module to forward a packet received from the first CE router to the hub PE router when the packet contains an address referencing a second CE router coupled to the VPN via a second spoke PE router.