
Content-based Copy Detection is set of techniques that match duplicate (i.e. exact copies) or near-duplicate (i.e. some noise or a few changes) pairs of content. While it may seem like copy detection is a not difficult, given the ability to digitally copy video and audio files, this project aims to match content pairs that have undergone severe distortions, or in the case of pictures of real-world objects examples may not be exactly the same to begin with.
Although the differences between duplicate and near-duplicate content are slight, there is an easy way to make the classification. In video and multimedia, a duplicate pair exists if the pixels of the image that you see are the same in two sources. Real-world examples of duplicate content can be found in newspapers, books, even television broadcasts. If you purchased two of any of these objects from different locations, the content (i.e. images, audio, and text) will be exactly the same. A near-duplicate pair exists in video and multimedia if the subject matter of the content is the same, but it was captured differently or has been significantly altered by some processing step. One common real-world example of near-duplicate content is the different view points that one sees on television for public speeches at the same event.

This example demonstrates two possible near-duplicate pairs. The top was created by natural scene differences due to the point-of-view of the camera. The bottom pair was created by intentional processing and editing manipulations. For a content-based copy detection system to work in real-world conditions, both must be accounted for.
Discovery of content (object) Efficient metadata generation - lnk to metadata Summarization of content (remove duplicates) - link to BBC summarization
Using SIFT interest points as the main anchor for CCD task.
More information coming soon, thanks for your patience!
More information coming soon, thanks for your patience!
More information coming soon, thanks for your patience!
Project Members
Related Projects
Enhanced Indexing and Representation with Vision-Based Biometrics
AT&T Application Resource Optimizer (ARO) - For energy-efficient apps
CHI Scan (Computer Human Interaction Scan)
CoCITe – Coordinating Changes in Text
E4SS - ECharts for SIP Servlets
Scalable Ad Hoc Wireless Geocast
Graphviz System for Network Visualization
Information Visualization Research - Prototypes and Systems
Swift - Visualization of Communication Services at Scale
AT&T Natural VoicesTM Text-to-Speech
StratoSIP: SIP at a Very High Level
Content Augmenting Media (CAM)
Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)
MIRACLE and the Content Analysis Engine (CAE)
Social TV - View and Contribute to Public Opinions about Your Content Live
Visual Semantics for Intuitive Mid-Level Representations
eClips - Personalized Content Clip Retrieval and Delivery
iMIRACLE - Content Retrieval on Mobile Devices with Speech
AT&T WATSON (SM) Speech Technologies
Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization