Video - Indexing and Representation (Metadata)

Metadata as a Proxy for Representation and Indexing

Metadata is textual or numerical information that describes high-level properties of a piece of content. A few examples of metadata are a title, creation time, content duration, author, detected faces, etc. To be efficient and effective, a piece of metadata should generally consume fewer resources than the original data. For example, while one could create metadata for a movie that describes each frame of that movie with ten words - resulting in an astonishing 1,620,000 words total words (10 words/frame x 30 frames/second x 60 seconds/minute x 90 minutes)! A more effective description might contain information about the actors, the length of the movie, or the locations of scenes in the movie.

In the context multimedia and video content, metadata can have a wide variety of representations. Each representation creates another way that the content can be indexed (quickly accessed) by information retrieval systems, like databases. The list and illustration below provide a sample of some of the metadata representations that are created in the MIRACLE platform and are available for use in subsequent indexing, retrieval and content consumption tasks.

  • Simple metadata provided with the video (title, date, description, air date, actor information, and hypertext links to related materials).
  • Textual content captured from subtitles, transcripts, and closed captions. These forms of textual content are often the most reliable because they have been manually created by editors and content providers.
  • Textual content automatically derived from speech (dialog and narration). Speech recognition is performed by the AT&T WATSON system with a large-vocabulary speech recognition model (or grammar).  With the assistance of other textual sources, transcripts from speech recognition can help the CAE automatically learn new words such as unusual locations around the world or the latest buzz word in new technology.   
  • Visual information computed with video analysis techniques that detect changes in the scene (a fade, cut, dissolve, etc.) and perform face clustering to find recurring characters or actors in a video.
  • Speaker segmentation information allowing differentiation among speakers.  Speaker segmentation helps to identify the dialog of different people, like the president and reporters in a press release.  Segmentation also facilitates other automatic processes such as summarization and speaker recognition (the automatic association of a face with a voice).

miracle_engine

Real-time Multimedia Analysis

More information coming soon, thanks for your patience!

Applications

More information coming soon, thanks for your patience!

Unsupervised Segmentation and Classification

More information coming soon, thanks for your patience!

Applications

More information coming soon, thanks for your patience!

Innovations in Standards and Protocol Definitions

meta_atislogo meta_dlnalogo meta_mpeg7logo meta_rsslogo
 

The Alliance for Telecommunications Industry Solutions (ATIS) develops standards for a broad range of communications applications. The ATIS IPTV Interoperability Forum (IIF) is a subgroup focused on advanced television services delivered over managed networks to connected TVs, set-top boxes, and mobile devices.

The scope of the work includes delivery of HD and 3D live TV programming over multicast IP transport, targeted advertising, video and other content on demand, and DVR capabilities. Rigorous content security protocols and detailed quality of service metrics are defined and the services support broadcast requirements for accessibility and emergency alerting.

Data models are defined for content description, program guides, user preferences, etc., and are represented in XML schemas to ensure interoperability. These schemas are harmonious with existing industry standards such as OMA BCAST and MPEG-7, and since the AT&T CIS also supports MPEG-7 representation of extracted metadata, there is a clear path to enabling advanced video services in a standards compliant manor.