Video - Content Delivery and Consumption

Content Delivery to Any Platform

Even with today's standards and open-source platforms, it is difficult to guarantee content delivery on all platforms. There are three main challenges for content delivery: codecs, bandwidth, and security.

  • Codecs - Codecs define how the content is compressed from raw images and uncompressed during playback. Over time, many different codecs have seen popularity, which often implies a strong standard and wide support, starting from generic interleaved files (AVI), to MPEG-1 and MPEG-2 video, and currently MPEG-4 (or H.264) video. As processing speeds continue to improve, the complexity within an algorithm can also be increased.
  • Bandwidth - Bandwidth availability determines how the content is delivered to the rendering client.
    The most straight-forward method is a point-to-point delivery system, or unicast, which conveys bytes directly from the source to the destination. Within unicast techniques, there are options to delivery a streamed file or a chunked file. In streaming, the source and target keep one network port open during the entire transfer and bytes are ideally delivered at a constant rate. With chunking, a video file is parsed into smaller chunks (in memory) and the source first sends a "playlist" to the target describing how the file was chunked. This way, the target can quickly seek to different parts of the file by requesting a single chunk and the source can immediately drop or restart the network port to process this request. One popular protocol using this technique is Apple's Live HTTP Streaming Server which supports all requests over HTTP alone.
    A secondary delivery broadcasts bytes to an entire network, called multicast, and allows anyone on that network to capture the stream and decode its content. Generally multicast may be favored if there is one piece of content that many different users are simultaneously consuming - like a live sports broadcast or presidential address. Within AT&T, multicast streams are utilized in the U-Verse service to allow customers to instantly switch channels and see content instead of waiting several seconds for the service to re-tune and begin decoding.
  • Security - Security for content delivery most commonly involves distribution rights and how those rights are used by a client (streaming, transcoding, burning a copy). One standard created to seamlessly exchange content between devices is DLNA. More information about our work involving content delivery standards can be found on our metadata information page.

Minimizing Resource Use with Summarization

One way to make things work faster and reduce the amount of time spent watching irrelevant content is to identify interesting or relevant parts of a content and to recompose it into smaller segments, or summarize it, automatically. In prior work, we summarized "rushes content" to find things that interest user and present videos that were reconstituted to only include this data. Rushes content is most commonly a byproduct of shooting a movie or television series, because before distribution to the public, movies and TV programs undergo a lot of editing by their directors and producers to select the scene from many.

BBC Rushes Diagram

For two years, TRECVID, an evaluation event sponsored by NIST, provided rushes content from the BBC that was used for an evaluation in a series of summarization tasks. As illustrated on the left, rushes content contains multiple shots (short video segments of 7-30 seconds in length) that belong to the same scene are filmed multiple times, perhaps for different timing, actor cues, or camera viewpoints. There are two objectives for the summarization task: to minimize the amount of redundant content (i.e. the same actor dialog or same viewpoint) and to emphasize highly unique, or interesting, content (i.e. a different facial expression or location of a person in a scene). Both of these cues can be leveraged to help editors and directors more quickly select the content that they want in the final version of the movie or television program.  

Finding Things that are "Interesting"

While it may be easy for a person to say what is interesting in a photo or movie, it is much more challenging for computers. Algorithms that model human interest are generally constructed to emulate the biological processes at work in the human vision system, often referred to as salience. Several methods exist to identify high-salience locations in an image and over time in videos. Our parallel efforts in content-based copy detection algorithms harness local feature points that points that look like sharp edges and corners that have been found to be some of the first points in an image that humans identify. In this work, we focused on methods that identified regions of high difference in terms of color, intensity, and edge structure. After computing salience images with each modality at different scales, an average image is composed from all three. With these final salience images, different parts of a video can be compared to each other to select the most salient (or most important) video segment.  

Salience Computations

Example Application

After the shots of a video have been scored with a salience algorithm, a number of interesting applications can be created from summarized content. Three applications are given below along with two summary renderings that were evaluated as part of the TRECVID 2007 BBC Rushes evaluation. In this work, we evaluated several permutations both programmatically and subjectively through several user ratings.

  • unique content discovery - Salience information, used in concert with content-based copy detection can compare the most salient regions of two near-duplicate content segments. Content renderings like the example on the left can be used to quickly identify and emphasize frame uniqueness.
  • fixed-duration summary - With salience information, a content summary can be made to fit a limited amount of time. This new content composition can include information about original content positions and other metadata.
  • variable-speed playback In addition to fixed-duration compositions, the playback speed of content can be varied in accordance with the salience of its video content. Renderings like the example on the right use a time line on the bottom to indicate similar and active content regions in addition to a variable rate playback indicator on the bottom right.

Salience Computations

Innovations in Standards and Protocol Definitions

meta_atislogo meta_dlnalogo meta_mpeg7logo meta_rsslogo

The Alliance for Telecommunications Industry Solutions (ATIS) develops standards for a broad range of communications applications. The ATIS IPTV Interoperability Forum (IIF) is a subgroup focused on advanced television services delivered over managed networks to connected TVs, set-top boxes, and mobile devices.

The scope of the work includes delivery of HD and 3D live TV programming over multicast IP transport, targeted advertising, video and other content on demand, and DVR capabilities. Rigorous content security protocols and detailed quality of service metrics are defined and the services support broadcast requirements for accessibility and emergency alerting.

Data models are defined for content description, program guides, user preferences, etc., and are represented in XML schemas to ensure interoperability. These schemas are harmonious with existing industry standards such as OMA BCAST and MPEG-7. All forms of the Content Analysis Engine support MPEG-7 representation of extracted metadata, enabling advanced video services in a standards compliant manor.