Content-based Copy Detection is set of techniques that match duplicate (i.e. exact copies) or near-duplicate (i.e. some noise or a few changes) pairs of content. While it may seem like copy detection is a not difficult, given the ability to digitally copy video and audio files, this project aims to match content pairs that have undergone severe distortions, or in the case of pictures of real-world objects examples may not be exactly the same to begin with.
Although the differences between duplicate and near-duplicate content are slight, there is an easy way to make the classification. In video and multimedia, a duplicate pair exists if the pixels of the image that you see are the same in two sources. Real-world examples of duplicate content can be found in newspapers, books, even television broadcasts. If you purchased two of any of these objects from different locations, the content (i.e. images, audio, and text) will be exactly the same. A near-duplicate pair exists in video and multimedia if the subject matter of the content is the same, but it was captured differently or has been significantly altered by some processing step. One common real-world example of near-duplicate content is the different view points that one sees on television for public speeches at the same event.
This example demonstrates two possible near-duplicate pairs. The top was created by natural scene differences due to the point-of-view of the camera. The bottom pair was created by intentional processing and editing manipulations. For a content-based copy detection system to work in real-world conditions, both must be accounted for.
More information coming soon, thanks for your patience!