
Video - Content Delivery and Consumption
Even with today's standards and open-source platforms, it is difficult to guarantee content delivery on all platforms. There are three main challenges for content delivery: codecs, bandwidth, and security.
One way to make things work faster and reduce the amount of time spent watching irrelevant content is to identify interesting or relevant parts of a content and to recompose it into smaller segments, or summarize it, automatically. In prior work, we summarized "rushes content" to find things that interest user and present videos that were reconstituted to only include this data. Rushes content is most commonly a byproduct of shooting a movie or television series, because before distribution to the public, movies and TV programs undergo a lot of editing by their directors and producers to select the scene from many.

For two years, TRECVID, an evaluation event sponsored by NIST, provided rushes content from the BBC that was used for an evaluation in a series of summarization tasks. As illustrated on the left, rushes content contains multiple shots (short video segments of 7-30 seconds in length) that belong to the same scene are filmed multiple times, perhaps for different timing, actor cues, or camera viewpoints. There are two objectives for the summarization task: to minimize the amount of redundant content (i.e. the same actor dialog or same viewpoint) and to emphasize highly unique, or interesting, content (i.e. a different facial expression or location of a person in a scene). Both of these cues can be leveraged to help editors and directors more quickly select the content that they want in the final version of the movie or television program.
While it may be easy for a person to say what is interesting in a photo or movie, it is much more challenging for computers. Algorithms that model human interest are generally constructed to emulate the biological processes at work in the human vision system, often referred to as salience. Several methods exist to identify high-salience locations in an image and over time in videos. Our parallel efforts in content-based copy detection algorithms harness local feature points that points that look like sharp edges and corners that have been found to be some of the first points in an image that humans identify. In this work, we focused on methods that identified regions of high difference in terms of color, intensity, and edge structure. After computing salience images with each modality at different scales, an average image is composed from all three. With these final salience images, different parts of a video can be compared to each other to select the most salient (or most important) video segment.

After the shots of a video have been scored with a salience algorithm, a number of interesting applications can be created from summarized content. Three applications are given below along with two summary renderings that were evaluated as part of the TRECVID 2007 BBC Rushes evaluation. In this work, we evaluated several permutations both programmatically and subjectively through several user ratings.

Project Members
Sub-Projects
Enhanced Indexing and Representation with Vision-Based Biometrics