Metadata is textual or numerical information that describes high-level properties of a piece of content. A few examples of metadata are a title, creation time, content duration, author, detected faces, etc. To be efficient and effective, a piece of metadata should generally consume fewer resources than the original data. For example, while one could create metadata for a movie that describes each frame of that movie with ten words - resulting in an astonishing 1,620,000 words total words (10 words/frame x 30 frames/second x 60 seconds/minute x 90 minutes)! A more effective description might contain information about the actors, the length of the movie, or the locations of scenes in the movie.
In the context multimedia and video content, metadata can have a wide variety of representations. Each representation creates another way that the content can be indexed (quickly accessed) by information retrieval systems, like databases. The list and illustration below provide a sample of some of the metadata representations that are created in the MIRACLE platform and are available for use in subsequent indexing, retrieval and content consumption tasks.
RTMM or Real-time MultiMedia Analysis is the application of several components of the Content Analysis Engine in a real-time fashion. That means metadata for video segments, speech recognition, detected faces, summarized keywords, and even visual concepts can be produced on-the-fly for just about any stream. Allowing any technology to stream and capture content for a processing instance, the RTMM system was intended for live or near-live analysis of multimedia content.
When used as part of a larger framework, the RTMM system produces metadata that can be used in a number of powerful systems. For example, if a user wanted to receive alerts with the relevant video clip about a specific headline, the RTMM could be used create the appropriate content playlist and trigget eClips. In another scenario if the RTMM is incorporated in a content creation stage at a service provider, it could create metadata streams for several content channels and send those to all users for their own personalized alerts. A prototype of this system was created as a service in the Content Augmenting Media (CAM) project, which not only offers an "alerting service" for current TV content, but also creates an improved EPG (electronic program guide) by providing information from the live content itself. Other projects tailored to mobile devices, summarization engines, and content recommendation could also utilize the real-time metadata streams generated by the RTMM.
As the amount and diversity of content continues to grow, a need for intelligent segmentation of video is required to understand the content and semantics of a content segement. Additionally, with the wide adoption of social content sharing sites, like YouTube, Vine, and Vimeo short-form or "snackable" content segments are popular for remixing, fast sharing, and expressing ideas quickly.
Harnessing state-of-the-art methods developed in MIRACLE and CAE platforms, content can be partitioned into small "shots" as illustrated to the right. These shots are generally consistent in content (the same scene, often a fixed camera, etc.) so they are logically ideal for subsequent semantic classification object detection, and image search and copy detection. Finally, by comparing the structure and repeatability of the pieces themself it may be possile to achieve resource savings with content summarization, which can benefit the end-user by reducing non-relevant content and saving bandwidth in transmitting the content.
Content segmentation is an important part of almost many content-related application. From making home-videos more interesting (or linking similar family videos together) to making a succinct video to share via a mobile device. Looking at large collections of personal content, complete photo and video sharing applications like VidCat use content segmentation to create a more streamlined and enjoyable user experience.