
180 Park Ave - Building 103
Florham Park, NJ
www2.research.att.com/~sen
Subject matter expert in Network measurement, Confguration management, Traffic analysis, Application classification, Network security, Network performance
Dr. Subhabrata Sen is a Principal Member in the Networking and Services Research Center at AT&T Labs Research. His research spans IP network management, configuration management, traffic analysis, application classification, network data mining, security and anomaly detection, and networked streaming multimedia.
Shubho received a Bachelor of Engineering in Computer Science from Jadavpur University, India, and M.S. and PHD. in Computer Science from the University of Massachusetts at Amherst. He is a recipient of the AT&T CTO Innovation award. He is a member of IEEE and ACM.
Automatically Inferring the Evolution of Malicious Activity on the Internet.
Shobha Venkataraman, David Brumley, Subhabrata Sen, Oliver Spatscheck
NDSS,
2013.
[PDF]
[BIB]
ISOC Copyright
The definitive version was published in 2013, Volume $vol, 2013-02-24, http://www.cs.unc.edu/~amw/ndss2013/ISOC_copyright_forms.txt
{Internet-based services routinely contend with a range of
malicious activity (e.g., spam, scans, botnets) that can potentially
arise from virtually any part of the global Internet
infrastructure and that can shift longitudinally over time. In
this paper, we develop the first algorithmic techniques to automatically
infer regions of the internet with shifting security
characteristics in an online fashion. Conceptually, our
key idea is to model the malicious activity on the Internet as
a decision tree over the IP address space, and identify the
dynamics of the malicious activity by inferring the dynamics
of the decision tree. Our evaluations on large corpuses of
mail data and botnet data indicate that our algorithms are
fast, can keep up with Internet-scale traffic data, and can
extract changes in sources of malicious activity substantially
better (a factor of 2.5) than approaches based on using
predetermined levels of aggregation such as BGP-based
network-aware clusters. Our case studies demonstrate our
algorithm’s ability to summarize large shifts in malicious
activity to a small number of IP regions (by as much as two
orders of magnitude), and thus help focus limited operator
resources. Using our algorithms, we find that some regions
of the Internet are prone to much faster changes than others,
such as a set of small and medium-sized hosting providers
that are of particular interest to mail operators.}

To Cache or not to Cache: The 3G case
Jeffrey Erman, Alexandre Gerber, Mohammad Hajiaghayi, Dan Pei, Subhabrata Sen, Oliver Spatscheck
IEEE Internet Computing,
2011.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE Internet Computing, 2011. , 2011-01-01
{Cellular networks have witnessed tremendous traffic growth
recently, fueled by the rapid proliferation of smartphones,
laptops with mobile data cards, and new technologies improving
the performance of these networks. However, unlike
the wired world, there exists a rather limited understanding
of the application mixes and the characteristics of this traffic.
Recent studies have shown that in the wired broadband
world, HTTP traffic accounts for the vast majority of the application
traffic and that forward caching of HTTP objects
results in substantial savings in network resources. What
about cellular networks? The answer is a function of the
traffic characteristics, network architecture, as well as the
various cost points associated with delivering traffic in these
networks. In this paper, we examine the characteristics of
HTTP traffic generated by millions of users across one of
the world�s largest 3G cellular networks, and explore the potential
of forward caching. We provide a simple cost model
that third parties can easily use to determine the cost-benefit
tradeoffs for their own cellular network settings. This is the
first large scale caching analysis in cellular networks.}

Profiling Resource Usage for Mobile Applications: A Cross-layer Approach
Feng Qian, Zhaoguang Wang, Alexandre Gerber, Z. Morley Mao, Subhabrata Sen, Oliver Spatscheck
in Proc. of ACM MobiSys,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proc of ACM MobiSys , 2011-06-27.
{Despite the popularity of mobile applications, their performance
and energy bottlenecks remain hidden due to a lack of visibility into
the resource-constrained mobile execution environment with po-
tentially complex interaction with the application behavior. We de-
sign and implement ARO, the mobile Application ResourceOptimizer,
the first tool that efficiently and accurately exposes the cross-layer
interaction among various layers including radio resource chan-
nel state, transport layer, application layer, and the user interac-
tion layer to enable the discovery of inefficient resource usage for
smartphone applications. To realize this, ARO provides three key
novel analyses: (i) accurate inference of lower-layer radio resource
control states, (ii) quantification of the resource impact of applica-
tion traffic patterns, and (iii) detection of energy and radio resource
bottlenecks by jointly analyzing cross-layer information. We have
implemented ARO and demonstrated its benefit on several essential
categories of popular Android applications to detect radio resource
and energy inefficiencies, such as unacceptably high (46%) energy
overhead of periodic audience measurements and inefficient con-
tent prefetching behavior.}

Over The Top Video: the Gorilla in Cellular Networks
Jeffrey Erman, Alexandre Gerber, Subhabrata Sen, Oliver Spatscheck, Kadangode Ramakrishnan
in Proc. of ACM Internet Measurement Conference (IMC),
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 20XX. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proc. of ACM Internet Measurement Conference (IMC). , 2011-11-01.
{Cellular networks have witnessed tremendous traffic growth
recently, fueled by smartphones, tablets and new high speed
broadband cellular access technologies. A key application
driving that growth is video streaming. Yet very little is
known about the characteristics of this traffic class. In this
paper, we examine video traffic generated by three million
users across one of the world�s largest 3G cellular networks.
This first deep dive into cellular video streaming shows that
HLS, an adaptive bitrate streaming protocol, accounts for
one third of the streaming video traffic and that it is common
to see changes in encoding bitrates within a session. We also
observe that most of the content is streamed at less than 255
Kbps and that only 40% of the videos are fully downloaded.
Another key finding is that there exists significant potential
for caching to deliver this content.}

Making Sense of Customer Tickets in Cellular Networks
Yu Jin, Nicholas Duffield, Alexandre Gerber, Patrick Haffner, Wen Hsu, Guy Jacobson, Shobha Venkataraman, Zhi-Li Zhang, Subhabrata Sen
in Proc. IEEE INFOCOM Mini-Conference,
2011.
[PDF]
[BIB]
{Abstract�Effective management of large-scale cellular data
networks is critical to meet customer demands and expectations.
Customer calls for technical support provides direct indication as
to the issues and problems customers encounter. In this paper we
study the customer tickets � free-text recordings and classifications
by customer support agents � collected at a large cellular network
provider, with two inter-related goals: i) to characterize and
understand the major factors which lead to customers to call
and seek support; and ii) to utilize such customer tickets to
help identify potential network problems. For this purpose, we
develop a novel statistical approach to model customer call rates
which account for customer-side factors (e.g., user tenure and
handset types) as well as geo-locations. We show that most calls
are due to customer-side factors and can be well captured by the
model. Furthermore, we also demonstrate that location-specific
deviations from the model provide a good indicator of potential
network-side issues. The latter is corroborated with the detailed
analysis of customer tickets and other independent data sources
(non-ticket customer feedback and network performance data).}

Large-scale App-based Reporting of Customer Problems in Cellular Networks: Potential and Limitations
Yu Jin, Nicholas Duffield, Alexandre Gerber, Patrick Haffner, Wen Hsu, Guy Jacobson, Subhabrata Sen, Shobha Venkataraman, Zhi-Li Zhang
in Proc. ACM SIGCOMM Workshop on Measurements Up the STack (W-MUST),
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM SIGCOMM Workshop on Measurements Up the STack (Wâ€MUST) , 2011-08-19.
{Multidimensional distributions are often used in data min- ing to describe and summarize different features of large datasets. It is natural to look for distinct classes in such datasets by clustering the data. A common approach entails the use of methods like k-means clustering. However, the k-means method inherently relies on the Euclidean metric in the embedded space and does not account for additional topology underlying the distribution.
In this paper, we propose using Earth Mover Distance (EMD) to compare multidimensional distributions. For a n-bin histogram, the EMD is based on a solution to the transportation problem with time complexity O(n3 log n). To mitigate the high computational cost of EMD, we pro- pose an approximation that reduces the cost to linear time.
Other notions of distances such as the information theo- retic Kullback-Leibler divergence and statistical χ2 distance, account only for the correspondence between bins with the same index, and do not use information across bins, and are sensitive to bin size. A cross-bin distance measure like EMD is not affected by binning differences and meaningfully matches the perceptual notion of “nearness”.
Our technique is simple, efficient and practical for clus- tering distributions. We demonstrate the use of EMD on a practical application of clustering over 400,000 anonymous mobility usage patterns which are defined as distributions over a manifold. EMD allows us to represent inherent re- lationships in this space. We show that EMD allows us to successfully cluster even sparse signatures and we compare the results with other clustering methods. Given the large size of our dataset a fast approximation is crucial for this application.}

Internet-scale Visualization and Detection of Performance Events
Shobha Venkataraman, Jeffrey Pang, Subhabrata Sen, Oliver Spatscheck
Usenix Annual Technical Conference,
2011.
[PDF]
[BIB]
USENIX Copyright
The definitive version was published in Proceedings of the Annual Technical Conference, Usenix. , 2011-06-15
{Operators typically monitor the performance of network server farms
using rule-based scripts to automatically flag "events of interest" on
an array of active and passive performance measurement feeds.
However, such automatic detection is typically limited to events with
known properties. A different challenge involves detecting the
"unknown unknowns" -- the events of interest whose properties are
unknown, and therefore, cannot be defined beforehand. Visualization
can significantly aid the rapid discovery of such unknown patterns, as
network operators, with domain expertise, may quickly notice
unexpected shifts in traffic patterns when represented visually.
However, the volume of Internet-wide raw performance data can easily
overwhelm human comprehension, and therefore, an effective
visualization needs to be sparse in representation, yet discriminating
of good and poor performance.
This paper presents a tool that can be used to visualize performance metrics at Internet-scale. At its core, the tool builds decision trees over the IP address space using performance measurements, so that IP addresses with similar performance characteristics are clustered together, and those with significant performance differences are separated. These decision trees need to be dynamic -- i.e., learnt online, and adapt to changes in the underlying network. We build these adaptive decision trees by extending online decision-tree learning algorithms to the unique challenges of classifying performance measurements across the Internet, and
our tool then visualizes these adaptive decision trees, distinguishing parts of the
network with good performance from those with poor performance. We
show that the differences in the visualized decision trees helps us
quickly discover new patterns of usage and novel anomalies in latency
measurements at a large server farm.
}

Demo: Mobile Application Resource Optimizer (ARO)
Feng Qian, Zhaoguang Wang, Alexandre Gerber, Z. Morley Mao, Subhabrata Sen, Oliver Spatscheck
in Proc. of ACM MobiSys,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proc. of ACM MobiSys , 2011-06-27.
{Despite the popularity of mobile applications, their performance
and energy bottlenecks remain hidden due to a lack of visibility into
the resource-constrained mobile execution environment with potentially
complex interaction with the application behavior. We design
and implement ARO, mobile Application Resource Optimizer,
the first tool that efficiently and accurately exposes the cross-layer
interaction to enable the discovery of inefficient resource usage.}
TOP: Tail Optimization Protocol for Cellular Radio Resource Allocation
Alexandre Gerber, Oliver Spatscheck, Subhabrata Sen, Feng Qian (University Michigan), Zhaoguang Wang (University Michigan), Z. Morley Mao (University Michigan)
ICNP, 18th IEEE International Conference on Network Protocols,
2010.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published inICNP, 18th IEEE International Conference on Network Protocols, 2010 , 2010-10-05
{In 3G cellular networks, the release of radio resources
is controlled by inactivity timers. However, the timeout
value itself, also known as the tail time, can last up to 15 seconds
due to the necessity of trading off resource utilization efficiency
for low management overhead and good stability, thus wasting
considerable amount of radio resources and battery energy at
user handsets. In this paper, we propose Tail Optimization Protocol
(TOP), which enables cooperation between the phone and
the radio access network to eliminate the tail whenever possible.
Intuitively, applications can often accurately predict a long idle
time. Therefore the phone can notify the cellular network on such
an imminent tail, allowing the latter to immediately release radio
resources. To realize TOP, we utilize a recent proposal of 3GPP
specification called fast dormancy, a mechanism for a handset to
notify the cellular network for immediate radio resource release.
TOP thus requires no change to the cellular infrastructure
and only minimal changes to smartphone applications. Our
experimental results based on real traces show that with a
reasonable prediction accuracy, TOP saves the overall radio
energy (up to 17%) and radio resources (up to 14%) by reducing
tail times by up to 60%. For applications such as multimedia
streaming, TOP can achieve even more significant savings of
radio energy (up to 60%) and radio resources (up to 50%).}

Network DVR: A Programmable Framework for Application-Aware Trace Collection
Chia-Wei Chang, Alexandre Gerber, University of California Bill Lin, Subhabrata Sen, Oliver Spatscheck
in Proc. Passive and Active Measurement Conference (PAM),
2010.
[PDF]
[BIB]
Springer Copyright
The definitive version was published in PAM/2010 (Springer, LNCS). , 2010-04-09
{Network traces are essential for a wide range of network applications,
including traffic analysis, network measurement, performance monitoring, and
security analysis. Existing capture tools do not have sufficient built-in intelligence
to understand these application requirements. Consequently, they are forced to
collect all packet traces that might be useful at the finest granularity to meet a
certain level of accuracy requirement. It is up to the network applications to process
the per-flow traffic statistics and extract meaningful information. But for a
number of applications, it is much more efficient to record packet sequences for
flows that match some application-specific signatures, specified using for example
regular expressions. A basic approach is to begin memory-copy (recording)
when the first character of a regular expression is matched. However, often times,
a matching eventually fails, thus consuming unnecessary memory resources during
the interim. In this paper, we present a programmable application-aware triggered
trace collection system called Network DVR that performs precisely the
function of packet content recording based on user-specified trigger signatures.
This in turn significantly reduces the number of memory copies that the system
has to consume for valid trace collection, which has been shown previously as
a key indicator of system performance [8]. We evaluated our Network DVR implementation
on a practical application using 10 real datasets that were gathered
from a large enterprise Internet gateway. In comparison to the basic approach in
which the memory-copy starts immediately upon the first character match without
triggered-recording, Network DVR was able to reduce the amount of memorycopies
by a factor of over 500x on average across the 10 datasets and over 800x
in the best case.}

NEVERMIND, the Problem Is Already Fixed: Proactively Detecting and Troubleshooting Customer DSL Problems
Yu Jin, Nicholas Duffield, Alexandre Gerber, Patrick Haffner, Subhabrata Sen, Zhi-Li Zhang
in Proc. of ACM CoNext,
2010.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM CoNext 2010 , 2010-11-30.
{Traditional DSL troubleshooting solutions are reactive, relying
mainly on customers to report problems, and tend to
be labor-intensive, time consuming, prone to incorrect resolutions
and overall can contribute to increased customer
dissatisfaction. In this paper, we propose a proactive approach
to facilitate troubleshooting customer edge problems
and reducing customer tickets. Our system consists of: i) a
ticket predictor which predicts future customer tickets; and
ii) a trouble locator which helps technicians accelerate the
troubleshooting process during field dispatches. Both components
infer future tickets and trouble locations based on
existing sparse line measurements, and the inference models
are constructed automatically using supervised machine
learning techniques. We propose several novel techniques to
address the operational constraints in DSL networks and to
enhance the accuracy of NEVERMIND. Extensive evaluations
using an entire years worth of customer ticket and measurement
data from a large network show that our method
can predict thousands of future customer tickets per week
with high accuracy and reduce significantly reduce the time
and effort for diagnosing these tickets. This is beneficial as it
has the effect of both reducing the number of customer care
calls and improving customer satisfaction.}

HTTP in the Home: It is not just about PCs
Jeffrey Erman, Alexandre Gerber, Subhabrata Sen
in Proc. ACM SIGCOMM Workshop on Home Networks (HomeNets) and ACM CCR, January 2011, 90-95. BEST PAP,
2010.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in ACM HomeNETs 2010 , 2010-09-03.
{HTTP (Hypertext Transport Protocol)was originally primarily
used for human-initiated client-server communications
launched from web browsers traditional computers and laptops.
However, today it has become the protocol of choice
for a bewildering range of applications from a wide array
of emerging devices like smart TVs and gaming consoles.
This paper presents an initial study characterizing the nontraditional
sources of HTTP traffic such as consumer devices
and automated updates in the overall HTTP traffic for residential
Internet users. Among our findings, 25%of all HTTP
traffic is due to non-traditional sources, with 17.9% being
from consumer devices such as wifi enabled cell phones and
5.1% generated from automated software updates and background
processes. We also found 11% of HTTP requests
are caused from communications to advertising servers. The
mouse is gone: the iPhone and the Xbox don't have one, and
automated applications don't need one.}

FlowRoute: Inferring Forwarding Table Updates Using Passive Flow-level Measurements
Lee Breslau, Cheng Ee, Alexandre Gerber, Subhabrata Sen, Amogh Dhamdhere, Nicholas Duffield, Carsten Lund
in Proc. of ACM Internet Measurement Conference (IMC),
2010.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Internet Measurement Conference , 2010-11-01.
{The reconvergence of routing protocols in response to changes
in network topology can impact application performance.
While improvements in protocol specification and implementation have significantly reduced reconvergence times,
increasingly performance-sensitive applications continue to
raise the bar for these protocols. As such, monitoring the
performance of routing protocols remains a critical activity
for network operators. We design FlowRoute, a tool based
on passive data plane measurements that we use in conjunction with control plane monitors for offline debugging and
analysis of forwarding table dynamics. We discuss practical
constraints that affect FlowRoute, and show how they can
be addressed in real deployment scenarios. As an application of FlowRoute, we study forwarding table updates by
backbone routers at a tier-1 ISP. We detect interesting behavior such as delayed forwarding table updates and routing
loops due to buggy routers � confirmed by network opera-
tors � that are not detectable using traditional control plane
monitors.}

Characterizing Radio Resource Allocation for 3G Networks
Oliver Spatscheck, Subhabrata Sen, Alexandre Gerber, Feng Qian, Zhaoguang Wang, Z. Morley Mao
in Proc. of ACM Internet Measurement Conference (IMC),
2010.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Internet Measurement Conference , 2010-11-01.
{3G cellular data networks have recently witnessed explosive
growth. In this work, we focus on UMTS, one of the
most popular 3G mobile communication technologies. Our
work is the first to accurately infer, for any UMTS network,
the state machine (both transitions and timer values)
that guides the radio resource allocation policy through a
light-weight probing scheme. We systematically characterize
the impact of operational state machine settings by analyzing
traces collected from a commercial UMTS network, and
pinpoint the inefficiencies caused by the interplay between
smartphone applications and the state machine behavior.
Besides the basic characterization, we explore the optimal
state machine settings in terms of several critical timer values
evaluated using real network traces.
Our findings suggest that the fundamental limitation of
the current state machine design is the static nature of treating
all traffic according to the same inactivity timer, making
it difficult to balance the tradeoffs among radio resource usage
efficiency, network management overhead, device radio
energy consumption, and performance. To the best of our
knowledge, our work is the first empirical study that employs
real cellular traces to investigate the optimality of the state
machine configurations. Our analysis also demonstrates that
traffic patterns impose significant impact on the radio resource
and energy consumption. In particular, We propose
a simple improvement that reduces YouTube streaming energy
by 80% by leveraging an existing feature called fast
dormancy supported by the 3GPP specifications.}

Tracking Dynamic Sources of Malicious Activity at Internet-Scale
Shobha Venkataraman, Subhabrata Sen, Oliver Spatscheck, Avrim Blum, Dawn Song
Neural Information Processing Systems (NIPS) 2009,
2009.
[PDF]
[BIB]
{We formulate and address the problem of discovering dynamic malicious regions on the Internet. We model this problem as one of adaptively pruning a known decision tree, but with additional challenges: (1) severe space requirements, since the underlying decision tree has over 4 billion leaves, and (2) a changing target function, since malicious activity on the Internet is dynamic. We present a novel algorithm that addresses this problem, by putting together a number of different "experts" algorithms and online paging algorithms. We prove guarantees on our algorithm?s performance as a function of the best possible pruning of a similar size, and our experiments show that our algorithmachieves high accuracy on large real-world data sets, with significant improvements over existing approaches. }

TCP Revisited: A Fresh Look at TCP in the Wild
Feng Qian, Alexandre Gerber, Z. Morley Mao, Subhabrata Sen, Oliver Spatscheck, Walter Willinger
in Proc. of ACM Internet Measurement Conference (IMC),
2009.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2009. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Internet Measurement Conference, 2009-11-04
{Since the last in-depth studies of measured TCP traffic some 6-
8 years ago, the Internet has experienced significant changes, including
the rapid deployment of backbone links with 1-2 orders
of magnitude more capacity, the emergence of bandwidth-intensive
streaming applications, and the massive penetration of new TCP
variants. These and other changes beg the question whether the
characteristics of measured TCP traffic in today�s Internet reflect
these changes or have largely remained the same. To answer this
question, we collected and analyzed packet traces from a number of
Internet backbone and access links, focused on the �heavy-hitter�
flows responsible for the majority of traffic. Next we analyzed their
within-flow packet dynamics, and observed the following features:
(1) in one of our datasets, up to 15.8% of flows have an initial congestion
window (ICW) size larger than the upper bound specified
by RFC 3390. (2) Among flows that encounter retransmission rates
of more than 10%, 5% of them exhibit irregular retransmission behavior
where the sender does not slow down its sending rate during
retransmissions. (3) TCP flow clocking (i.e., regular spacing between
flights of packets) can be caused by both RTT and non-RTT
factors such as application or link layer, and 60% of flows studied
show no pronounced flow clocking. To arrive at these findings,
we developed novel techniques for analyzing unidirectional TCP
flows, including a technique for inferring ICW size, a method for
detecting irregular retransmissions, and a new approach for accurately
extracting flow clocks.}

Multicast Redux: A First Look at Enterprise Multicast Traffic
Elliott Karpilovsky, Lee Breslau, Alexandre Gerber, Subhabrata Sen
in Proc. of ACM SIGCOMM Workshop: Research on Enterprise Networking (WREN),
2009.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2009. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM SICOMM 2009 , 2009-08-17.
{IP multicast, after spending much of the last 20 years as
the subject of research papers, protocol design efforts and
limited experimental usage, is finally seeing significant deployment in production networks. The efficiency afforded
by one-to-many network layer distribution is well-suited to
such emerging applications as IPTV, file distribution, conferencing, and the dissemination of financial trading information. However, it is important to understand the behavior
of these applications in order to see if network protocols are
appropriately supporting them. In this paper we undertake
a study of enterprise multicast traffic as observed from the
vantage point of a large VPN service provider. We query
multicast usage information from provider edge routers for
our analysis. To our knowledge, this is the first study of production multicast traffic. Our purpose is both to understand
the characteristics of the tra�c (in terms of �ow duration,
throughput, and receiver dynamics) and to gain insight as
to whether the current mechanisms support multicast VPNs
can be improved. Our analysis reveals several classes of multicast traffic for which changes to the underlying protocols
may yield benefits.}

Multi-VPN Optimization for Scalable Routing via Relaying
MohammadHossein Bateni, Alexandre Gerber, Mohammad Hajiaghayi, Subhabrata Sen
in Proc. IEEE INFOCOM Mini-Conference and IEEE/ACM Transactions on Networking,
2009.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in Proc. IEEE INFOCOM Mini-Conference and IEEE/ACM Transactions on Networking, 2009. , 2009-04-19
{Enterprise networks are increasingly adopting
Layer 3 Multiprotocol Label Switching (MPLS) Virtual Private
Network (VPN) technology to connect geographically disparate
locations. The any-to-any direct connectivity model of this technology
involves a very high memory footprint and is causing
associated routing tables in the service provider�s routers to grow
very large. The concept of Relaying was proposed earlier [9]
to separately minimize the routing table memory footprint of
individual VPNs, and involves selecting a small number of hub
routers to maintain complete reachability information for that
VPN, and enabling non-hub spoke routers with reduced routing
tables to achieve any-to-any reachability by routing traffic via a
hub.
A large service provider network typically hosts many thousands
of different VPNs. In this paper, we generalize Relaying
to the multi-VPN environment, and consider new constraints on
resources shared across VPNs, such as router uplink bandwidth
and memory. The hub selection problem involves complex tradeoffs
along multiple dimensions including these shared resources,
and the additional distance traversed by traffic. We formulate the
hub selection as a constraint optimization problem and develop
an algorithm with provable guarantees to solve this NP-complete
problem. Evaluations using traces and configurations from a
large provider and many real-world VPNs indicate that the
resulting Relaying solution substantially reduces the total router
memory requirement by 85% while smoothing out the utilization
on each router and requiring only a small increase in the endto-
end path for the relayed traffic.}

Scalable VPN Routing via Relaying
Changhoon Kim, Alexandre Gerber, Carsten Lund, Dan Pei, Subhabrata Sen
in Proc. ACM SIGMETRICS,
2008.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2008. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM SIGMETRICS 2008 , 2008-06-06.
{Enterprise customers are increasingly adopting MPLS(Multiprotocol Label Switching) VPN (Virtual Private Network) service that offers direct any-to-any reachability among the customer sites via a provider network. Unfortunately this direct reachability model makes the service provider's routing tables grow very large as the number of VPNs and the number of routes per customer increase. As a result, router memory in the provider's network has become a key bottleneck in provisioning new customers. This paper proposes Relaying, a scalable VPN routing architecture that the provider can implement simply by modifying the configuration of routers in the provider network, without requiring changes to the router hardware and software. Relaying substantially reduces the memory footprint of VPNs by choosing a small number of hub routers in each VPN that maintain full reachability information, and by allowing nonhub routers to reach other routers through a hub. Deploying Relaying in practice, however, poses a challenging optimization problem that involves minimizing router memory usage by having as few hubs as possible, while limiting the additional latency due to indirect delivery via a hub. We first investigate the fundamental tension between the two objectives and then develop algorithms to solve the optimization problem by leveraging some unique properties of VPNs, such as sparsity of traffic matrices and spatial locality of customer sites. Extensive evaluations using real traffic matrices, routing configurations, and VPN topologies demonstrates that Relaying is very promising and can reduce routing-table usage by up to 90%, while increasing the additional distances traversed by traffic by only a few hundred miles, and the backbone bandwidth usage by less than 10%. }
Analyzing the spatial quality of Internet streaming video
Amy R. Reibman, Subhabrata Sen, Jacobus Van Der Merwe
International Workshop on Video Processing and Quality Metrics,
2005.
[PDF]
[BIB]
P2P, the gorilla in the cable
Alexander Gerber, Matthew Roughan, Subhabrata Sen, Joseph Houle, Han Nguyen
National Cable & Telecommunications Association (NCTA) National Show,
2003.
[PDF]
[BIB]
Copyright
� 2003 AT&T Corp. All rights reserved.
{There is considerable interest in peer-to-peer (P2P) traffic because of its remarkable increase over the last few years. By analyzing flow measurements at the regional aggregation points of several cable operators, we are able to study its properties. P2P has become a large part of broadband traffic and its characteristics are different from older applications, such as the Web. It is a stable balanced traffic: the peak to valley ratio during a day is around two and the IN/OUT traffic balance is close to one. Although P2P protocols are based on a distributed architecture, they don't show strong signs of geographical locality. A cable subscriber is not much more likely to download a file from a close region than from a far region. It is clear that most of the traffic is generated by heavy hitters who abuse P2P (and other) applications, whereas most of the subscribers only use their broadband connections to browse the web, exchange emails or chat. However it is not easy to directly block or limit P2P traffic, because these applications adapt themselves to their environment: the users develop ways of eluding the traffic blocks. The traffic that could historically be identified with five port numbers is now spread over thousands of TCP ports, pushing port based identification to its limits. More complex methods to identify P2P traffic are not a long-term solution, the cable industry should opt for a ?pay for what you use? model like the other utilities. }
AT&T Science and Technology, 2012.
For invention, innovation and advocacy of the Application Resource Optimization (ARO) tool and optimization techniques for improving the efficiency of cellular network applications.