
180 Park Ave - Building 103
Florham Park, NJ
Method For Summarizing Data In Unaggregated Data Streams,
Tue Jun 05 12:52:22 EDT 2012
A method for producing a summary A of data points in an unaggregated data stream wherein the data points are in the form of weighted keys (a, w) where a is a key and w is a weight, and the summary is a sample of k keys a with adjusted weights w.sub.a. A first reservoir L includes keys having adjusted weights which are additions of weights of individual data points of included keys and a second reservoir T includes keys having adjusted weights which are each equal to a threshold value .tau. whose value is adjusted based upon tests of new data points arriving in the data stream. The summary combines the keys and adjusted weights of the first reservoir L with the keys and adjusted weights of the second reservoir T to form the sample representing the data stream upon which further analysis may be performed. The method proceeds by first merging new data points in the stream into the reservoir L until the reservoir contains k different keys and thereafter applying a series of tests to new arriving data points to determine what keys and weights are to be added to or removed the reservoirs L and T to provide a summary with a variance that approaches the minimum possible for aggregated data sets. The method is composable, can be applied to high speed data streams such as those found on the Internet, and can be implemented efficiently.
System And Method For Spatially Consistent Sampling Of Flow Records At Constrained, Content-Dependent Rates,
Tue Nov 22 16:06:37 EST 2011
Disclosed herein are systems, computer-implemented methods, and computer-readable media for sampling network traffic. The method includes receiving a desired quantity of flow record to sample, receiving a plurality of network flow record each summarizing a network flow of packets, calculating a hash for each flow record of based on one or more invariant part of a respective flow, generating a quasi-random number from the calculated hash for each respective flow record, generating a priority from the calculated hash for each respective flow record, and sampling exactly the desired quantity of flow records, selecting flow records having a highest priority first. In one aspect, the method further partitions the plurality of flow records into groups based on flow origin and destination, generates an individual priority for each partitioned group, and separately samples exactly the desired quantity of flow records from each partitioned group, selecting flows having a highest individual priority first.
Optimal Combination Of Sampled Network Measurements,
Tue Sep 27 16:06:11 EDT 2011
Two regularized estimators that avoid the pathologies associated with variance estimation are disclosed. The regularized variance estimator adds a contribution to estimated variance representing the likely error, and hence ameliorates the pathologies of estimating small variances while at the same time allowing more reliable estimates to be balanced in the convex combination estimator. The bounded variance estimator employs an upper bound to the variance which avoids estimation pathologies when sampling probabilities are very small.
Variance-Optimal Sampling-Based Estimation Of Subset Sums,
Tue Aug 23 16:06:03 EDT 2011
The present invention relates to a method of obtaining a generic sample of an input stream. The method is designated as VAROPT.sub.k. The method comprises receiving an input stream of items arriving one at a time, and maintaining a sample S of items i. The sample S has a capacity for at most k items i. The sample S is filled with k items i. An n.sup.th item i is received. It is determined whether the n.sup.th item i should be included in sample S. If the n.sup.th item i is included in sample S, then a previously included item i is dropped from sample S. The determination is made based on weights of items without distinguishing between previously included items i and the n.sup.th item i. The determination is implemented thereby updating weights of items i in sample S. The method is repeated until no more items are received.
Methods And Apparatus To Bound Network Traffic Estimation Error For Multistage Measurement Sampling And Aggregation,
Tue Aug 02 16:05:49 EDT 2011
Methods and apparatus to bound network traffic estimation error for multistage measurement sampling and aggregation are disclosed. An example method disclosed herein comprises determining a hierarchical sampling topology representative of multiple data sampling and aggregation stages, the hierarchical sampling topology comprising a plurality of nodes connected by a plurality of edges, each node corresponding to at least one of a data source and a data aggregation operation, and each edge corresponding to a data sampling operation characterized by a generalized sampling threshold, selecting a first generalized sampling threshold from a set of generalized sampling thresholds associated with a respective set of edges originating at a respective set of descendent nodes of a target node undergoing network traffic estimation, and transforming a measured sample of network traffic into a confidence interval for a network traffic estimate associated with the target node using the first generalized sampling threshold and an error parameter.
Method And Apparatus For Managing Hierarchical Collections Of Data,
Tue Apr 19 16:04:56 EDT 2011
A method and system provide for management of a collection of data records. The data records have associated therewith an identifier or code that indicates the most coarse level of granularity with which the data record is associated in a hierarchy of sampling subsets created across a range of granularity levels.
Sampling And Analyzing Packets In A Network,
Tue Dec 14 15:05:20 EST 2010
The preferred embodiments of the present invention can include sampling packets transmitted over a network based on the content of the packets. If a packet is sampled, the sampling unit can add one or more fields to the sampled packet that can include a field for a number of bytes contained in the packet, a packet count, a flow count, a sampling type, and the like. The sampled packets can be analyzed to discern desired information from the packets. The additional fields that are added to the sampled packets can be used during the analysis.
System And Method For Deriving Traffic Demands For A Packet-Switched Network,
Tue Sep 14 15:04:42 EDT 2010
The present invention is directed to a method and system for deriving traffic demands for a packet-switched network. A novel model of defining traffic demands as a volume of load originating from an ingress link and destined to a set of egress links enables support for traffic engineering and performance debugging of large operational packet-switched networks.
Scalable Multiprotocol Label Switching Based Virtual Private Networks And Methods To Implement The Same,
Tue Sep 14 15:04:40 EDT 2010
Example scalable multi-protocol label switching (MPLS) based virtual private networks (VPNs) and methods to implement the same are disclosed. A disclosed example spoke provider edge (PE) router for an MPLS-based VPN includes a truncated virtual routing and forwarding (VRF) table containing a first value referencing a hub PE router and a second value referencing a first customer edge (CE) router coupled to the VPN via the PE router, and a forwarding module to forward a packet received from the first CE router to the hub PE router when the packet contains an address referencing a second CE router coupled to the VPN via a second spoke PE router.
Algorithms And Estimators For Summarization Of Unaggregated Data Streams,
Tue Jul 27 15:04:14 EDT 2010
The invention relates to streaming algorithms useful for obtaining summaries over unaggregated packet streams and for providing unbiased estimators for characteristics, such as, the amount of traffic that belongs to a specified subpopulation of flows. Packets are sampled from a packet stream and aggregated into flows and counted by implementation of: (a) Adaptive Sampled NetFlow (ANF), and adjusted weight (A.sup.A.sup.NF) of a flow (f) is calculated as follows: A.sup.A.sup.NF(f)=i(f)/p'; i(f) being the number of packets counted for a flow f, and p' being the sampling rate at end of a measurement period; or (b) Adaptive Sample-and-Hold (ASH), and adjusted weight (A.sup.A.sup.SH) of a flow (f) is calculated as follows: A.sup.A.sup.SH(f)=i(f)+(1-p')/p'; i(f) being the number of packets counted for a flow f, and p' being the sampling rate at end of a measurement period.
Algorithms And Estimators For Summarization Of Unaggregated Data Streams,
Tue Jun 29 15:04:09 EDT 2010
The invention relates to streaming algorithms useful for obtaining summaries over unaggregated packet streams and for providing unbiased estimators for characteristics, such as, the amount of traffic that belongs to a specified subpopulation of flows. Packets are sampled from a packet stream and aggregated into flows and counted by implementation of Adaptive Sample-and-Hold (ASH) or Adaptive NetFlow (ANF), adjusting the sampling rate based on a quantity of flows to obtain a sketch having a predetermined size, the sampling rate being adjusted in steps; and transferring the count of aggregated packets from SRAM to DRAM and initializing the count in SRAM following adjustment of the sampling rate.