
180 Park Ave - Building 103
Florham Park, NJ
http://www.research.att.com/~jiawang
Jia joined AT&T Labs, Inc. - Research in February 2001. Her research focuses on network measurement and management, network security, performance analysis/ troubleshooting, IPTV, and Internet routing. She received her MS and PhD degrees in Computer Science from Cornell University in May 1999 and January 2001, respectively. She is currently a senior member of IEEE and a member of ACM.
Mitigating Low-Rate Denial-Of-Service Attacks In Packet-Switched Networks,
Tue May 14 17:26:18 EDT 2013
A method includes determining, at a network routing device, an average packet drop rate for a plurality of aggregations of packet flows. The method also determines a threshold packet drop rate based on the average packet drop rate, a current packet drop rate for a select aggregation of the plurality of aggregations, and whether at least one packet flow of the select aggregation is potentially subject to a denial-of-service attack based on a comparison of the current packet drop rate to the threshold packet drop rate.
Reliability As An Interdomain Service,
Tue Apr 16 17:25:58 EDT 2013
A system and techniques to increase the redundancy (i.e., physical diversity and bandwidth) available to an IP network, thereby increasing the failure processing capability of IP networks. The techniques include pooling the resources of multiple networks together for mutual backup purposes to improve network reliability and employing methods to efficiently utilize both the intradomain and the interdomain redundancies provided by networks at low cost.
Device And Method For Detecting And Diagnosing Correlated Network Anomalies,
Tue Feb 12 17:25:06 EST 2013
A device detects and diagnoses correlated anomalies of a network. The device includes an anomaly detection module receiving a first data stream including an event-series related to the network. The anomaly detection module executes at least one algorithm to detect a potential anomaly in the event-series. The device further includes a correlating module receiving a second data stream including other event-series related to the network. The correlating module determines whether the potential anomaly is false and determines whether the potential anomaly is a true anomaly.
System And Method To Locate A Prefix Hijacker Within A One-Hop Neighborhood,
Tue Jan 08 17:24:39 EST 2013
Method, system and computer-readable medium to locate a prefix hijacker of a destination prefix within a one-hop neighborhood on a network. The method includes generating one-hop neighborhoods from autonomous system (AS)-level paths of plural monitors to a destination prefix. The method also includes determining a suspect set of AS identifiers resulting from a union of the one-hop neighborhoods. The method further includes calculating a count and a distance associated with each AS identifier of the suspect set. The count indicates how often the AS identifier appeared in the one-hop neighborhoods. The distance indicates a total distance from the AS identifier to AS identifiers associated with the plural monitors. Yet further, the method includes generating a one-hop suspect set of AS identifiers from the suspect set that have highest counts and highest distances.
Methods, Systems, And Computer Program Products For Protecting Against IP Prefix Hijacking,
Tue Oct 23 16:12:08 EDT 2012
A communication network is operated by identifying at least one potential hijack autonomous system (AS) that can be used to generate a corrupt routing path from a source AS to a destination AS. For each of the at least one potential hijack AS the following operations are performed: identifying at least one regional AS that is configured to adopt the corrupt routing path from the source AS to the destination AS and determining a reflector AS set such that, for each reflector AS in the set, a source AS to reflector AS routing path and a reflector AS to destination AS routing path do not comprise any of the at least one regional AS. A reflector AS is then identified that is common among the at least one reflector AS set responsive to performing the identifying and determining operations for each of the at least one potential hijack AS.
Method And Apparatus For Inferring Network Paths,
Tue Apr 10 16:09:52 EDT 2012
Disclosed is a method and apparatus for inferring AS paths between two endpoint nodes communicating over a network having a plurality of nodes without having access to the endpoint nodes. The method and apparatus determine routing tables of at least some of the plurality of nodes. A relationship between each node is then inferred from the routing tables. The method and apparatus then determine a path between the two endpoint nodes from the relationship and the routing table determination.
Method And Apparatus For Mitigating Routing Misbehavior In A Network,
Tue Mar 20 16:09:37 EDT 2012
Method and apparatus for mitigating routing misbehavior in a network is described. In one example, routing protocol traffic is received from a remote router destined for a local router. The routing protocol traffic is parsed to identify a subset of traffic. The subset of traffic is normalized to identify and correct misconfigured routing updates. The routing protocol traffic is provided to the local router. In one embodiment, the subset of traffic is normalized by at least one of detecting and correcting routing protocol semantics, detecting and correcting violations in routing policies, detecting and correcting routing anomalies, or mitigating routing instability.
Managing Netowrk Traffic For Improved Availability Of Network Services,
Tue Jan 10 16:08:56 EST 2012
Managing network traffic to improve availability of network services by classifying network traffic flows using flow-level statistical information and machine learning estimation, based on a measurement of at least one of relevance and goodness of network features. Also, determining a network traffic profile representing applications associated with the classified network traffic flows, and managing network traffic using the network traffic profile. The flow-level statistical information includes packet-trace information and is available from at least one of Cisco NetFlow, NetStream or cflowd records. The classification of network flows includes tagging packet-trace flow record data based on defined packet content information. The classifying of network flows can result in the identification of a plurality of clusters based on the measurement of the relevance of the network features. Also, the classification of network traffic can use a correlation-based measure to determine the goodness of the network features.
Method And Apparatus For Detecting Computer-Related Attacks,
Tue Oct 18 16:06:18 EDT 2011
Disclosed is a method and apparatus for detecting prefix hijacking attacks. A source node is separated from a destination network at a first time via an original path. The destination network is associated with a prefix. At a second time, a packet is transmitted from the source node to the destination network to determine a current path between the source node and the destination network. A packet is also transmitted from the source node to a reference node to determine a reference node path. The reference node is located along the original path and is associated with a prefix different than the prefix associated with the destination network. The current path and the reference node path are then compared, and a prefix hijacking attack is detected when the reference node path is not a sub-path of the current path.
Method And Apparatus For Optimizing A Firewall,
Tue Jun 21 16:05:32 EDT 2011
Disclosed is a method and system for optimizing a first set of rules enforced by a firewall on network traffic. Characteristics of the network traffic are examined and these characteristics are used to generate a second set of rules. The first set of rules may have a different order than the second set of rules.
System And Method For Real-Time Diagnosis Of Routing Problems,
Tue Mar 01 16:04:33 EST 2011
A system and method for detecting and diagnosing routing problems in a network in real-time by recording TCP flow information from at least one server to at least one prefix, and observing retransmission packets communicated from the at least one server to the at least one prefix. When a predetermined threshold for TCP flows to a prefix is reached, traceroutes may be triggered to a destination in the prefix, and the traceroutes analyzed to determine whether to issue an alarm for a routing failure. The system includes a real-time data collection engine for recording unidirectional TCP flow information, a real-time detection engine for observing the retransmission packets and issuing a warning upon a retransmission counter exceeding a predetermined threshold, and a real-time diagnosis engine for triggering at least one traceroute to a destination in the prefix that is randomly selected from TCP flows in retransmission states.
Estimating Origin-Destination Flow Entropy,
Tue Aug 10 15:04:23 EDT 2010
The preferred embodiments of the present invention are directed to estimating entropy of origin-destination (OD) data flows in a network. To achieve this, first and second sketches are created corresponding to ingress (i.e. origin) and egress (i.e. destination) flows. The sketches allow estimating entropy associated with data streams as well as entropy associated with an intersection of two or more of the data streams, which provides a mechanism for estimating the entropy OD flows in a network.
Method for fast network-aware clustering,
Tue May 15 18:12:02 EDT 2007
A method for clustering together network IP addresses is disclosed. A number of IP addresses are received and processed to determine which IP addresses share a longest prefix matching. The longest prefix matching process is performed according to radix encoded trie which facilitates on-line clustering of the IP addresses. Client and/or server IP addresses may be clustered in accordance with the teachings herein.
Fast prefix matching of bounded strings,
Tue Mar 13 01:05:25 EDT 2007
The present invention increases the efficiency of performing longest prefix matching operations by selecting a radix-encoded trie structure optimized with respect to memory cost. The structure is optimized by determining memory costs for retrie structures indexed on different numbers of high-order characters, and then selecting the structure corresponding to the lowest memory cost. The optimization improves performance in IP look-up operations as well as longest-prefix matching operations performed on general alphabets.
Method For Network-Aware Clustering Of Clients In A Network,
Tue Aug 09 18:10:29 EDT 2005
A method for clustering together network clients for guiding of placement of network servers is disclosed. A number of routing table prefix/netmask entries are aggregated and unified into a tubular format. The routing table entries may be converted into a singular format. A network server log is used to extract a number of client IP addresses which are compared to the entries within the unified routing table. A common prefix shared by a number of the client IP addresses and an entry in the unified routing table is determined and used to cluster the clients together in a client cluster. Network servers, such as proxy server, cache servers, content distribution servers and mirror server may be placed in the network according to the client clusters.

Tiresias: Online Anomaly Detection for Hierarchical Operational Network Data
Nicholas Duffield, Jia Wang, Chi Hong, Matthew Caesar
IEEE ICDCS 2012,
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in 2012. , 2012-06-18
{Operational network data, management data such as customer
care call logs and equipment system logs, is a very
important source of information for network operators to detect
problems in their networks. Unfortunately, there is lack
of efficient tools to automatically track and detect anomalous
events on operational data, causing ISP operators to rely
on manual inspection of this data. While anomaly detection
has been widely studied in the context of network data, operational
data presents several new challenges, including the
volatility and sparseness of data, and the need to perform fast
detection (complicating application of schemes that require
offline processing or large/stable data sets to converge).
To address these challenges, we propose Tiresias, an automated
approach to locating anomalous events on hierarchical
operational data. Tiresias leverages the hierarchical structure
of operational data to identify high-impact aggregates
(e.g., locations in the network, failure modes) likely to be associated
with anomalous events. To accommodate different
kinds of operational network data, Tiresias consists of an online
detection algorithm with low time and space complexity,
while preserving high detection accuracy. We present results
from two case studies using operational data collected at a
large commercial IPTV network operated by a Tier-1 ISP:
customer care calls log and set-top box crashes log. By comparing
with a reference set verified by the ISP�s operational
group, we validate that Tiresias can achieve > 94% accuracy
in locating anomalies. Tiresias also discovers several previously
unknown anomalies in the ISP�s customer care cases,
demonstrating its effectiveness.}

Firewall Fingerprinting
Dan Pei, Zihui Ge, Jia Wang, Amir R. Khakpour (Michigan State University), Josh Hulst (Michigan State University), Alex X. Liu (Michigan State University)
IEEE INFOCOM 2012,
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE INFOCOM 2012. , 2012-03-22
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM SIGmetrics 2011 , 2012-03-22.
{Firewalls are critical security devices handling all traffic in and out of a network. Firewalls, like other software and hardware network devices, have vulnerabilities, which can be exploited by motivated attackers. However, because firewalls are usually placed in the network such that they are transparent to the end users, it is very difficult to identify them and use their corresponding vulnerabilities to attack them. In this paper, we study firewall fingerprinting, in which one can use firewall decisions on a sequence of TCP packets with unusual flags and machine learning techniques for inferring firewall implementation.}
Characterizing Geospatial Dynamics of Application Usage in a 3G Cellular Data Network
Jeffrey Pang, Jia Wang, Lusheng Ji, Zubair Shafiq, Alex X. Liu
IEEE INFOCOM 2012,
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE INFOCOM 2012. , 2012-03-25
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Internet Measurement Conference 2011. , 2012-03-25.
{Recent studies on cellular network measurement have provided the evidence that significant geospatial correlations, in terms of traffic volume and application access, exist in cellular network usage. Such geospatial correlation patterns provide local optimization opportunities to cellular network operators for handling the explosive growth in the traffic volume observed in recent years. In this paper, we aim to characterize the geospatial dynamics of application usage in a 3G cellular data network. Our analysis is based on two simultaneously collected traces from the radio sub-network (containing location records) and the core sub-network (containing traffic records) of an operational cellular network in the United States. To better understand the application usage in our data, we first cluster cell locations based on their application distributions and then study the geospatial dynamics of application usage across different geographical regions. Our study reveals that the cell clustering results are significantly different for traffic volume in terms of byte or packet count, session count, and unique user count distributions across different geographical regions. The results of our measurement study present operators with fine-grained opportunities to tune network parameter settings. However, our results also suggest that care should be exercised so that cells are not optimized solely with respect to traffic volume based on byte or packet count, or session count because this may negatively impact other low volume applications that most users in those cells use.}

A First Look at Cellular Machine-to-Machine Traffic – Large Scale Measurement and Characterization
Zubair Shafiq, Lusheng Ji, Alex Liu, Jeffrey Pang, Jia Wang
SIGMETRICS,
2012.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in 2012 , 2012-06-11.
{Cellular network based Machine-to-Machine (M2M) communication is fast becoming a market-changing force for a wide spectrum of businesses and applications such as telematics, smart metering, point-of-sale terminals, and home security and automation systems. In this paper, we aim to answer the following important question: Does traffic generated by M2M devices impose new requirements and challenges for cellular network design and management? To answer this question, we take a first look at the characteristics of M2M traffic and compare it with traditional smartphone traffic. We have conducted our measurement analysis using a week-long traffic trace collected from a tier-1 cellular network in the United States. We characterize M2M traffic from a wide range of perspectives, including temporal dynamics, device mobility, application usage, and network performance.
Our experimental results show that M2M traffic exhibits significantly different patterns than smartphone traffic in multiple aspects. For instance, M2M devices have a much larger ratio of uplink to downlink traffic volume, their traffic typically exhibits different diurnal patterns, they are more likely to generate synchronized traffic resulting in bursty aggregate traffic volumes, and are less mobile compared to smartphones. On the other hand, we also find that M2M devices are generally competing with smartphones for net- work resources in co-located geographical regions. These and other findings suggest that better protocol design, more careful spectrum allocation, modified pricing schemes, and careful structuring of quality of service profiles may be needed to accommodate the rise of M2M devices.}

Towards a Universal Sketch for Origin-Destination Network Measurements
Jia Wang, Haiquan Zhao, Nan Hua, Ashwin Lall, Ping Li, Jun Xu
8th IFIP International Conference on Network and Parallel Computing ,
2011.
[PDF]
[BIB]
Springer Copyright
The definitive version was published in 8th IFIP International Conference on Network and Parallel Computing. , 2011-10-21, http://www.springer.com/cda/content/document/cda_downloaddocument/LNCS+Copyright+Form?SGWID=0-0-45-981960-0
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in ACM Internet Measurement Conference , 2011-10-21, http://www.springer.com/cda/content/document/cda_downloaddocument/LNCS+Copyright+Form?SGWID=0-0-45-981960-0.
{Despite its importance in today's Internet, network measurement
was not an integral part of the original Internet architecture,
i.e., there was (and still is) little native support
for many essential measurement tasks. Targeting the inadequacy
of counting/accounting capabilities of existing routers,
many data streaming and sketching techniques have been
proposed to estimate the important statistics of traffic going
through a network link. Most of these techniques are, however,
developed to track one specific statistic and/or answer
a specific type of query. Since there are a large number of
such statistics and queries of interest, it is very difficult, if
not impossible, for network vendors and operators to implement
and deploy data streaming/sketching solutions for all
of them, due to router resource (memory, CPU, bus bandwidth,
etc.) constraints.
In this paper, we propose a general-purpose solution that
can not only answer a wide range of queries, but also be able
to answer types of queries that were not known a priori. In
particular, we introduce the use of the Conditional Random
Sampling (CRS) sketch data structure for succinctly capturing
network traffic data between a set of nodes in the
network. This sketch is the first step towards a “universal”
sketch data structure in the sense that it is not tied to
measurement of a single quantity. We show that the CRS
sketch can compute unbiased estimates for any linear summary
statistic in the intersection of a pair of traffic streams,
e.g., traffic and flow matrix information, flow counts, and entropy.
We present detailed experiments, using data collected
at a tier-1 ISP, that show that our sketch is capable of estimating
this wide range of statistics with fairly high accuracy.}

SMALTA: Practical and Near-Optimal FIB Aggregation
Zartash Afzal, Markus Nebel, Ahsan Tariq, Sana Jawad, Ruichuan Chen, Aman Shaikh, Jia Wang, Paul Francis
ACM CoNEXT,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM CoNEXT , 2011-12-06.
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM SIGCOMM , 2011-12-06.
{IP Routers use sophisticated fast forwarding table (FIB) lookup algorithms that minimize lookup time, storage, and update time. This paper presents SMALTA, a practical, near-optimal FIB aggregation scheme that shrinks forwarding table size without modifying routing semantics or the external behavior of routers, and without requiring changes to FIB lookup
algorithms and associated hardware and software. On IP routers using the FIB lookup algorithmTree Bitmap, SMALTA consistently shrinks FIB storage by at least 50%, representing four years of routing table growth at current rates. SMALTA also reduces average lookup time by 25% for a uniform traffic
matrix. Besides the benefits this brings to future routers, SMALTA provides a critical easy-to-deploy one-time benefit to the installed base should IPv4 address depletion result in increased routing table growth rate. The effective cost of this improvement is a sub-second delay in inserting updates into
the FIB once every few hours. We describe SMALTA, prove its correctness, measure its performance using data from a Tier-1 provider as well as RouteViews, and describe an implementation in Quagga that demonstrates its ease of implementation.}

Rapid Detection of Maintenance Induced Changes in Service Performance
Ajay Mahimkar, Zihui Ge, Jia Wang, Jennifer Yates, Yin Zhang, Joanne Emmons, Brian Huntley, Mark Stockert
ACM CoNEXT (International Conference on emerging Networking EXperiments and Technologies),
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM CoNEXT International Conference on emerging Networking EXperiments and Technologies , 2011-10-31.
{Service quality in operational IP networks can be impacted due to planned or unplanned maintenance. During any maintenance activity, the responsibility of the operations team is to complete the work order and perform a check-up to ensure there are no unexpected service disruptions. Once the maintenance is complete, it is crucial to continuously monitor the network and look for any performance impacts. What operations lack today are effective tools to rapidly detect maintenance induced performance changes. The large scale and heterogeneity of network elements and performance metrics makes the problem extremely challenging.
In this paper, we present PRISM, a new tool for detecting maintenance induced performance changes in a timely fashion. PRISM uses association between maintenance and the network elements to identify performance metrics for time-series analysis. It uses a new Multiscale Robust Local Subspace algorithm (MRLS) to accurately identify changes in performance even when the baseline is contaminated. We systematically evaluate PRISM using data collected at four large operational networks: a tier-1 backbone, VoIP, IPTV and 3G cellular and show that it achieves good accuracy. We also demonstrate the effectiveness of PRISM in real operational environments through interesting case study findings.}

Q-score: Proactive Service Quality Assessment in a Large IPTV System
Jia Wang, Zihui Ge, Jennifer Yates, Ajay Mahimkar, Andrea Basso, Min Chen, Han Hee Song, Yin Zhang
ACM Internet Measurement Conference,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Internet Measurement Conference. , 2011-11-02.
{In large-scale IPTV systems, it is essential to
maintain high service quality while providing a wider variety
of service features than typical traditional TV. Thus
service quality assessment systems are of paramount importance
as they monitor the user-perceived service quality and
alert when issues occurs. For IPTV systems, however, there
is no simple metric to represent user-perceived service quality
and Quality of Experience (QoE).Moreover, there is only
limited user feedback, often in the form of noisy and delayed
customer complaints. Therefore, we aim to approximate the
QoE through a selected set of performance indicators in a
proactive (i.e., detect issues before customers complain) and
scalable fashion.
In this paper, we present service quality assessment framework,
Q-score, which accurately learns a small set of performance
indicators most relevant to user-perceived service
quality, and proactively infers service quality in a single score.
We evaluate Q-score using network data collected from a
commercial IPTV service provider and show that Q-score is
able to predict 60% of service problems that are complained
by customerswith 0.1%of false positives. ThroughQ-score,
we have (i) gained insight into various types of service problems
causing user dissatisfaction includingwhy users tend to
react promptly to sound issues while late to video issues; (ii)
identified and quantified the opportunity to proactively detect
the service quality degradation of individual customers
before severe performance impact occurs; and (iii) observed
possibility to adaptively allocate customer care workforce to
potentially troubling service areas.}

Characterizing and Modeling Internet Traffic Dynamics of Cellular Devices
Lusheng Ji, Jia Wang, M. Zubair Shafiq, Alex X. Liu
ACM SIGMETRICS 2011,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM SIGMetrics 2011, 2011-06-07
{Understanding Internet traffic dynamics in large cellular networks
is important for network design, troubleshooting, performance
evaluation, and optimization. In this paper, we
present the results from our study, which is based upon a
week-long aggregated flow level mobile device traffic data
collected from a major cellular operator�s core network. In
this study, we measured the spatial and temporal dynamics
of Internet traffic to characterize the behavior of mobile devices
used to access cellular networks. We distinguish our
study from other related work by conducting the measurement
at a larger scale and exploring device traffic patterns
along two new dimensions � device types and applications
carried by network traffic. Based on the findings of our measurement
analysis, we propose a Zipf-like model to capture
the distribution and a Markov model to capture the volume
dynamics of aggregate Internet traffic. We further customize
our models for different device types using an unsupervised
clustering algorithm to improve prediction accuracy.}

Analyzing IPTV Set-Top Box Crashes
Jia Wang, Zihui Ge, Jennifer Yates, Han Song, Ajay Mahimkar, Yin Zhang
ACM Sigcomm Workshop on Home Networks,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in ACM Sigcomm Workshop on Home Networks, 2011-08-15.
{Recent advances in residential broadband access
technology have led to a wave of commercial IPTV deployment.
As IPTV services are rolled out at scale, it is essential
for IPTV systems to maintain ultra-high reliability
and performance. A major issue that disrupts IPTV service
is the crash of the set-top box (STB) software, which directly
resides inside the consumer�s home network and provides
the essential interface to both the user and the network to
deliver rich contents that go well beyond traditional TV. To
understand the potential causes of STB crashes, we perform
in-depth statistical analysis on the relationship among STB
crashes, video stream contents, and user activities using logs
collected from a large commercial IPTV system. Our initial
results suggest that (i) impaired video streams may cause
STB to crash, and (ii) continuous usage of STB may gradually
degrade the STB health over time.}

Address-based Route Reflection
Aman Shaikh, Jia Wang, Ruichuan Chen, Paul Francis
ACM CoNEXT,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM CoNEXT , 2011-12-06.
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of ACM SIGCOMM , 2011-12-06.
{BGP Route Reflectors (RR), which are commonly used to help scale Internal BGP (iBGP), can produce oscillations, forwarding loops, and path inefficiencies. ISPs avoid these pitfalls through careful topology design, RR placement, and link-metric assignment. This paper presents Address-Based
Route Reflection (ABRR): the first iBGP solution that completely solves all oscillation and looping problems, has no path inefficiencies, and puts no constraints on RR placement. ABRR does this by emulating the semantics of full-mesh iBGP, and thereby adopting the correctness and path efficiency properties of full-mesh iBGP. Both traditional Topology-Based Route Reflection (TBRR) and ABRR take a divide-and-conquer approach. While TBRR scales by making each RR responsible for all prefixes fromsome fraction of routers, ABRR scales by making each RR responsible for some fraction of prefixes for all routers. We have implemented a fully functional ABRR prototype. Using BGP data from a Tier-1 ISP, our analytical and implementation results show that ABRR�s scaling and convergence properties compare positively with traditional TBRR.}

What Happened in my Network? Mining Network Events from Router Syslogs
Jia Wang, Zihui Ge, Dan Pei, Tongqing Qiu, Jun Xu
ACM/USENIX Interent Measurement Conference,
2010.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in ACM Internet Measurement Conference , 2010-11-01
{Router syslogs are messages that a router logs to describe
a wide range of events observed by it. They are considered
one of the most valuable data sources for monitoring
network health and for troubleshooting network faults and
performance anomalies. However, router syslog messages
are essentially free-form text with only a minimal structure,
and their formats vary among different vendors and router
OSes. Furthermore, since router syslogs are aimed for tracking
and debugging router software/hardware problems, they
are often too low-level from network service management
perspectives. Due to their sheer volume (e.g., millions per
day in a large ISP network), router syslog messages are typically
examined (manually by a network administrator) only
when required by an on-going troubleshooting investigation
or when given a narrow time range and a specific router
under suspicion. In this project, we design a SyslogDigest
system that can automatically transform and compress such
low-level minimally-structured syslog messages into meaningful
and prioritized high-level network events, using powerful
data mining techniques tailored to our problem domain.
These events are three orders of magnitude fewer in number
and have much better usability than raw syslog messages.
We demonstrate that they provide critical input to network
troubleshooting, and network health monitoring and visualization.}

TowerDefense: Deployment Strategies for Battling against IP Prefix Hijacking
Jia Wang, Lusheng Ji, Dan Pei, Tongqing Qiu, Jun Xu
IEEE ICNP ,
2010.
[BIB]
{IP prefix hijacking is one of the top security threats targeting today�s Internet routing protocol. Several schemes have been proposed to either detect or mitigate prefix hijacking events. However, none of these approaches is adopted and deployed in large-scale on the Internet due to reasons such as scalability, economical practicality, or unrealistic assumptions about the collaborations among ISPs. Thus there are no actionable and deployable solutions for dealing with prefix hijacking.
In this paper, we study key issues related to deploying and operating an IP prefix hijacking detection and mitigation system. Our contributions include (i) deployment strategies for hijacking detection and mitigation system (named as TOWERDEFENSE ): a practical service model for prefix hijacking protection and effective algorithms for selecting agent locations for detecting and mitigating prefix hijacking attacks; and (ii) large scale experi- ments on PlanetLab and extensive analysis on the performance of TOWERDEFENSE .}

Scalable Flow-Based Networking with DIFANE
Jia Wang, Minlan Yu, Jennifer Rexford, Michael Freedman
ACM Sigcomm 2010,
2010.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in ACM SIGcomm 2010 , 2010-08-30.
{Ideally, enterprise administrators could specify fine-grain policies
that drive how the underlying switches forward, drop,
and measure traffic. However, existing techniques for flowbased
networking rely too heavily on centralized controller
software that installs rules reactively, based on the first packet
of each flow. In this paper, we propose DIFANE, a scalable
and efficient solution that keeps all traffic in the data plane by
selectively directing packets through intermediate switches
that store the necessary rules. DIFANE relegates the controller
to the simpler task of partitioning these rules over the
switches. DIFANE can be readily implemented with commodity
switch hardware, since all data-plane functions can
be expressed in terms of wildcard rules that perform simple
actions on matching packets. Experiments with our prototype
on Click-based OpenFlow switches show that DIFANE
scales to larger networks with richer policies.}

Listen to Me if You can: Tracking user experience of mobile network on social media
Jia Wang, Zihui Ge, Jennifer Yates, Junlan Feng, Jun Xu, Tongqing Qiu
ACM/USENIX Internet Measurement Conference,
2010.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in ACM Internet Measurement Conference , 2010-11-01.
{Social media sites like twitter continue to grow at a fast pace.
People of all generations use social media to exchange messages
and share experiences of their life in a timely fashion.
Most of these sites make their data available. An intriguing
question is can we exploit this real-time and giant data-flow
to improve business in a measurable way. In this paper, we
are particularly interested in tweets (twitter messages) that
are relevant to mobile network performance. We compare
tweets with traditional source of user experience, i.e. customer
care tickets, and correlate both of them with network
incident reports. From our study, we have the following observations.
First, twitter users and users who call customer
service tend to report different types of performance issues.
Second, users on twitter are more accurate and faster to report
network problems that impact user experiences. Third,
tweets can show some short term performance impairments,
which are not recorded in incidents reports. These observations
prove that twitter a complimentary source for monitoring
network performance and their impact on user experiences.}
Towards Automated Performance Diagnosis in a Large IPTV Network
Zihui Ge, Aman Shaikh, Jia Wang, Jennifer Yates, Qi Zhao, Ajay Mahimkar, Yin Zhang
2009.
[PDF]
[BIB]
SYNERGY: Detecting and Diagnosing Correlated Network Anomalies
Jia Wang, Jennifer Yates, Qi Zhao, Ajay . Mahimkar, Ashwin Lall, Jun Xu
2009.
[PDF]
[BIB]
Troubleshooting Chronic Conditions in Large IP Networks
Jennifer Yates, Aman Shaikh, Jia Wang, Zihui Ge, Cheng Ee, Ajay Mahimkar, Yin Zhang
2008.
[PDF]
[BIB]
Towards Quantification of IP Network Reliability
Hao Wang, Alexandre Gerber, Albert Greenberg, Jia Wang, Yang Richard Yang
ACM SIGCOMM (Poster),
2007.
[PDF]
[BIB]
Copyright
(c) ACM, 2007. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM SIGCOMM '07 , 2007-09-01.
{}
Reliability as an Interdomain Service
Hao Wang, Yang Richard Yang, Paul H. Liu, Jia Wang, Alexandre Gerber
in Proc. of ACM SIGCOMM,
2007.
[PDF]
[BIB]
Copyright
(c) ACM, 2007. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM SIGCOMM '07 , 2007-08-27.
{Reliability is a critical requirement of the Internet. The availability and resilience of the Internet under failures can have significant global effects. However, in the current Internet routing architecture, achieving the high level of reliability demanded by many mission-critical activities can be costly. In this paper, we first propose a novel solution framework called reliability as an interdomain service (REIN) that can be incrementally deployed in the Internet and substantially improve the redundancy of ISP networks at low cost. We then present robust algorithms to efficiently utilize network redundancy to maximize reliability. We use real ISP topologies and traffic traces to demonstrate the effectiveness of our framework and algorithms. }
Network-wide Information Correlation and Exploration (NICE): Framework, Applications, and Experience
Jennifer Yates, Aman Shaikh, Jia Wang, Cheng Ee, Ajay Mahimkar, Yin Zhang
2007.
[PDF]
[BIB]
[BIB]