
180 Park Ave - Building 103
Florham Park, NJ
Unsupervised Clustering of Multidimensional Distributions using Earth Mover Distance
Simon Urbanek, Tamraparni Dasu, Shankar Krishnan, David Applegate
ACM KDD,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 20XX. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM KDD , 2011-08-01.
{Multidimensional distributions are often used in data min- ing to describe and summarize different features of large datasets. It is natural to look for distinct classes in such datasets by clustering the data. A common approach entails the use of methods like k-means clustering. However, the k-means method inherently relies on the Euclidean metric in the embedded space and does not account for additional topology underlying the distribution.
In this paper, we propose using Earth Mover Distance (EMD) to compare multidimensional distributions. For a n-bin histogram, the EMD is based on a solution to the transportation problem with time complexity O(n3 log n). To mitigate the high computational cost of EMD, we pro- pose an approximation that reduces the cost to linear time.
Other notions of distances such as the information theo- retic Kullback-Leibler divergence and statistical χ2 distance, account only for the correspondence between bins with the same index, and do not use information across bins, and are sensitive to bin size. A cross-bin distance measure like EMD is not affected by binning differences and meaningfully matches the perceptual notion of “nearness”.
Our technique is simple, efficient and practical for clus- tering distributions. We demonstrate the use of EMD on a practical application of clustering over 400,000 anonymous mobility usage patterns which are defined as distributions over a manifold. EMD allows us to represent inherent re- lationships in this space. We show that EMD allows us to successfully cluster even sparse signatures and we compare the results with other clustering methods. Given the large size of our dataset a fast approximation is crucial for this application.}

Robustness of Stream Mining Algorithms
Tamraparni Dasu, Gina-Maria Pomann, Shankar Krishnan
IDA 2011,
2011.
[PDF]
[BIB]
Springer Copyright
The definitive version was published in IDA 2011. , 2011-12-31
{Stream mining is a challenging problem that has attracted
considerable attention in the last decade. As a result there are
numerous algorithms for mining data streams, from summarizing
and analyzing, to change and anomaly
detection. However, most research focuses on proposing, adapting or
improving algorithms and studying their computational performance.
For a practitioner of stream mining, there is very little guidance on
choosing a technology suited for a particular task or application.
In this paper, we address the practical aspect of choosing a
suitable algorithm
by drawing on the statistical properties of {em power} and
{em robustness}. For the purpose of illustration, we focus on
change detection algorithms (CDAs). We define
an objective performance measure, {it streaming power},
and use it to explore the robustness of three different algorithms.
The measure is
comparable for disparate algorithms,
and provides a common framework for comparing and evaluating
change detection algorithms on any data set
in a meaningful fashion. We demonstrate on real world applications,
and on synthetic
data.
In addition, we present a
repository of data streams for the community to test change detection
algorithms for streaming data. }

Real-time image deconvolution on the GPU
James Klosowski, Shankar Krishnan
SPIE Conference: Parallel Processing for Imaging Applications,
2011.
[LINK]
[BIB]
SPIE Copyright
Copyright (c) 2011 Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited. The definitive version was published in SPIE Conference: Parallel Processing for Imaging Applications , 2011-01-25, http://spie.org/Documents/Publications/ProcCopyrightForm.pdf.
{Two-dimensional image deconvolution is an important and well-studied problem with applications to image deblurring and restoration. Most of the best deconvolution algorithms use natural image statistics that act as priors to regularize the problem. Recently, Krishnan and Fergus provide a fast deconvolution algorithm that yields results comparable to the current state of the art. They use a hyper-Laplacian image prior to regularize the problem. The resulting optimization problem is solved using alternating minimization in conjunction with a half-quadratic penalty function. In this paper, we provide an efficient CUDA implementation of their algorithm on the GPU. Our implementation leverages many well-known CUDA optimization techniques, as well as several others that have a significant impact on this particular algorithm. We discuss each of these, as well as make a few observations regarding the CUFFT library. Our experiments were run on an Nvidia GeForce GTX 260. For a single channel image of size 710 x 470, we obtain over 40 fps, while on a larger image of size 1900 x 1266, we get almost 6 fps (without counting disk I/O). In addition to linear performance, we believe ours is the first implementation to perform deconvolutions at video rates. Our running times also demonstrate that our GPU implementation is over 27 times faster than the original CPU implementation.}

Combining Predictors for Recommending Music: the False Positives' approach to KDD Cup track 2
Suhrid Balakrishnan, Rensheng Wang, Carlos Scheidegger, Angus Maclellan, Yifan Hu, Aaron Archer, David Applegate, Shankar Krishnan, Guang Ma, Siu Au
KDD CUP 2011 workshop.,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in KDD CUP 2011 workshop , 2011-08-21.
{We describe our solution for the KDD Cup 2011 track 2 challenge. Our solution relies heavily on ensembling together diverse individual models for the prediction task, and achieved a final leaderboard misclassification rate of 3.8863\%. This paper provides details on both the modeling and ensemble creation steps.}
Visualization research with large displays
Bin Wei, Claudio Silva, Eleftherios Koutsofios, Shankar Krishnan, Stephen North
IEEE Comput. Graph. Appl.,
IEEE Computer Society Press,
v20,
#4,
pp 50--54,
2000.
[PDF]
[BIB]
We describe our research at the AT&T Infolab on using large displays to interactively analyze and visualize AT&T's communication networks and services.
GROOM: Global Registration Of Multiple 3D Point Sets Via Optimization-On-A-Manifold,
Tue Nov 09 15:50:44 EST 2010
A method for registering multiple 3D point sets by determining optimal relative positions and orientations of the 3D point sets. Initial values are determined for the rotation matrices corresponding to the relative orientations of reference frames of the 3D point sets. A registration error cost function is optimized on a product manifold of all of the rotation matrices to determine optimal values of the rotation matrices. The optimal values of the rotation matrices are used to determine optimal values for translation vectors corresponding to the relative positions of the reference frames of the 3D point sets. The 3D point sets are registered on a common reference frame using the optimal rotation matrices and the optimal translation vectors.
Map Simplification System,
Tue Nov 02 18:10:07 EST 2004
A map simplification system performs dynamic view dependent simplification of relatively large geographic maps.
GFBanner
GFBanner.pdf
(25291k)
ida2011_streamingpower_data
ida2011_streamingpower_data.tgz
(97k)