
180 Park Ave - Building 103
Florham Park, NJ
Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses
Lukasz Golab, MohammadHossein Bateni, Mohammad Hajiaghayi, Howard Karloff
Theory of Computing Systems Journal,
2011.
[PDF]
[BIB]
Springer Science+Business Media Copyright
The definitive version was published in Theory of Computing Systems Journal. , 2011-07-01
{We study scheduling algorithms for loading data feeds into real time data
warehouses, which are used in applications such as IP network monitoring, online financial
trading, and credit card fraud detection. In these applications, the warehouse
collects a large number of streaming data feeds that are generated by external
sources and arrive asynchronously. Data for each table in the warehouse are generated at a
constant rate, different tables possibly at different rates.
For each data feed, the arrival of new data triggers an emph{update} that seeks to append
the new data to the corresponding table; if multiple updates are pending for the
same table, they are batched together before being loaded.
At time $ au$, if a table has been updated with information up to time $r le au$,
its emph{staleness} is defined as $ au - r$.
Our first objective is to schedule the updates on one or more processors in a way that
minimizes the total staleness.
In order to ensure fairness, our second objective is to limit the maximum ``stretch'',
which we define (roughly) as the
ratio between the duration of time an update waits till it is
finished being processed,
and the length of the update.
In contrast to earlier work proving the nonexistence of constant-competitive algorithms
for related scheduling problems, we prove that emph{any} online nonpreemptive algorithm,
no processor of which is ever voluntarily idle, incurs a staleness at most a constant
factor larger than an obvious lower bound on total staleness (provided that the processors
are sufficiently fast).
We give a constant-stretch algorithm,
provided that the processors are sufficiently fast,
for the emph{quasiperiodic model},
in which tables can be clustered into a few groups such that the update frequencies
within each group vary by at most a constant factor.
Finally, we show that our constant-stretch algorithm is also
constant-competitive (subject to the same proviso on processor speed)
in the quasiperiodic model with respect to total weighted staleness, where tables
are assigned weights that reflect their priorities.}

Improved Approximation Algorithms for Prize-Collecting Steiner Tree and TSP
Aaron Archer, MohammadHossein Bateni, Mohammad Hajiaghayi, Howard Karloff
SIAM Journal on Computing,
SIAM Journal on Computing,
v40,
#2,
pp 309-332,
2011.
[PDF]
[BIB]
We study the prize-collecting Steiner tree (PCST), prize-collecting traveling salesman (PCTSP), and prize-collecting path (PC-Path) problems. Given a graph (V,E) with a cost on each edge and a penalty (a.k.a. prize) on each node, the goal is to find a tree (for PCST), cycle (for PCTSP), or path (for PC-Path) that minimizes the sum of the edge costs in the tree/cycle/path and the penalties of the nodes not spanned by it. In addition to being a useful theoretical tool for helping to solve other optimization problems, PCST has been applied fruitfully by AT&T to the optimization of real-world telecommunications networks. The most recent improvements for the first two problems, a 2-approximation algorithm for each, appeared first in 1992; a 2-approximation for PC-Path appeared in 2003. The natural linear programming (LP) relaxation of PCST has an integrality gap of 2, which has been a barrier to further improvements for this problem.
We present (2-eps)-approximation algorithms for all three problems, connected by a unified technique for improving prize-collecting algorithms that allows us to circumvent the integrality gap barrier. Specifically, our approximation ratio for prize-collecting Steiner tree is below 1.9672.

Capacitated Metric Labeling
Howard Karloff, Mohammad Hajiaghayi, Matthew Andrews, Ankur Moitra
ACM-SIAM Symposium on Discrete Algorithms (SODA),
2011.
[PDF]
[BIB]
SIAM Copyright
The definitive version was published in ACM-SIAM Symposium on Discrete Algorithms (SODA) , 2011-01-23
{We introduce
Capacitated Metric Labeling. As in Metric Labeling, we
are given a weighted graph G = (V, E), a label set L, a
semimetric d_L on this label set, and an assignment cost
function phi: V x L -> R, and the goal, as
in Metric Labeling, is to find an assignment f: V->L
that minimizes a particular two-cost function,
but in Capacitated Metric Labeling with the additional restriction that each label t_i receive at most l_i nodes.
We give a O(log |V|)-approximation algorithm when the number of labels is
fixed. We prove that it is impossible to approximate the value
of an instance of Capacitated Metric Labeling to within
any finite factor, if P!=NP.
Furthermore, we prove that any finite-ratio polynomial-time
approximation algorithm must multiplicatively violate the label
capacities by Omega((log |L|)^(1/2-eps)).
}
Data Auditor: Exploring Data Quality and Semantics using Pattern Tableaux
Lukasz Golab, Howard Karloff, Philip Korn, Divesh Srivastava
2010.
[PDF]
[BIB]
{We present Data Auditor, a tool for exploring data quality and semantics. Data Auditor provides a variety of semantic rule (constraint) classes for asserting consistency and completeness properties of the data. It can be used to test hypotheses by instantiating rules over supplied sets of attributes, to see if such rules hold or fail (to a required degree) on a minimum fraction of the data. Regions of the data for which the rule is satisfied (alternatively, violated) are concisely and meaningfully summarized in a tableau. }
Improved Approximation Algorithms For Label Cover Problems
Howard Karloff, Mohammad Hajiaghayi, Moses Charikar
European Symposium on Algorithms,
2009.
[PDF]
[BIB]
Springer-Verlag Copyright
The definitive version was published in European Symposium on Algorithms , 2009-09-07
{}
Compressing Rectilinear Pictures And Minimizing Access Control Lists,
Tue Jun 07 16:05:29 EDT 2011
A geometric model is considered for the problem of minimizing access control lists (ACLs) in network routers. A colored rectilinear pattern is created within an initially white rectangular canvas, and the basic operation is to choose a subrectangle and paint it a single color, overwriting all previous colors in the rectangle. The method operates on rectangular rule lists (RRLs) and access control lists (ACLs) in which all rectangles are strips that extend either the full length or the full height of the canvas. A polynomial-time algorithm optimally constructs such patterns when, as in the ACL application, the only colors are black and white (permit or deny). That algorithm is complemented by a significantly faster approximation algorithm that is guaranteed to be no worse than 3/2 optimal.
Method For Selecting Nodes In A Network,
Tue Apr 27 15:03:43 EDT 2010
Given a set of network nodes B that are sought to be monitored, and a set of potential monitoring nodes, a subset M of the monitoring nodes is chosen that insures monitoring each node b in B with a pair of nodes m.sub.i and m.sub.j such that no node except b is on both any shortest path from b to m.sub.i and on any shortest path from b to m.sub.j. Some of the nodes in M are chosen in a first step by identifying a subset of B having nodes b that are "t-good" nodes, choosing a subset of potential monitoring nodes as First Partner nodes, and choosing a corresponding subset of potential monitoring nodes as Second Partner nodes. Others are chosen in a second step that handles nodes b that are not "t-good," using a greedy algorithm.
ACM Fellow, 2011.
For contributions to the design and analysis of algorithms