MIRACLE and the Content Analysis Engine (CAE)

What is MIRACLE?

The Multimedia Information Retrieval by Content (MIRACLE) project encompasses the technologies and interfaces required for multiple types of video search, including not only text content searches (using transcripts, subtitles, closed captions, and speech recognition) but also searches based on visual information and speaker segmentation.

Visual information includes both face clustering and scene change detection, allowing users to search video by selecting a face or scene from a list. Speaker segmentation, which differentiates among all speakers, allows users to find each instance where a particular person is speaking.

Miracle Architecture

What does the CAE do?

Indexing is performed by the MIRACLE content analysis engine (CAE), which integrates multiple technologies and Research projects to index most multimedia content — YouTube, TV broadcasts, full-length movies, home videos, and even podcasts — and then store the content  and index information in a database. The CAE indexes video and multimedia using many different types of content metadata, described in more detail in the indexing and representation techniques page.

Example Application

Using our 24-hour content acquisition system (URSA), the CAE processes a large number television channels automatically. Using the MIRACLE pipeline above, this content can be searched semantically, for visual duplicates, and analyzed through several aggregation methods. Interesting patterns can be discovered by looking at content with these aggregation methods. Using MIRACLE and URSA, the commercial start times and commercial duration of two television talk shows on NBC (Late Night with Jimmy Fallon and The Tonight Show with Jay Leno) were analyzed over about six months.

Analysis of Commercials by Retrieval

As the illustration indicates, commercials are aired at various times throughout the program on Late Night. This variance indicates that editors of Late Night often chose to delay a commercial break until a skit or interview segment was completed. This behavior can be contrasted against that of the Tonight Show which has a very regular time for commercial placement. This example application could aide content producers to verify the regularity of their program and it could aide users to better plan their viewing habits. Fore more information about other metadata generated by the CAE please view our indexing and representation techniques page.  

Can MIRACLE or the CAE be licensed?

Yes! New collaboration with our product development and enterprise businesses unit has produced an API suitable for independent developers as well as enterprise customers: The Visual API is part of AT&T's Developer Program that provides much of the same functionality of the CAE while adding additional provisions for user authentication, content hosting, and the ability to generate different amounts of content metadata according to customized profiles. Learn more about the Visual API or start your evaluation today by Applying to the Alpha API Program

What types of content can be processed?

A wide variety of video formats are handled by the Content Analysis Engine and played back on almost any device. Numerous device types are supported for searching and viewing MIRACLE-indexed video: web browsers on any desktop platform, game console, and smartphone, with the iPhone having a separate speech-enabled mobile application called iMIRACLE. Investigations for app playback and deployment on new platforms can be found on our delivery and consumption page.

MIRACLE also provides web-based search interfaces and a straight-forward search API. Metadata information that MIRACLE generates is stored in an XML format  so developers can write their own interfaces to access the content. With these simple interfaces, the services MIRACLE provides, which continue to grow, can easily be added to internet-based mash-ups allowing developers to spend time making fun and interesting new applications and not re-implementing MIRACLE services.

For a demonstration of searching within a MIRACLE-indexed video for a spoken phrase, click this link for a search of the word "sequences" in the Neil Sloane talk. The video starts from the first instance of the word sequences, with other instances marked in the clickable interactive timeline and highlighted in the transcript. Additionally, all of the automatically segmented phrases in the transcript are clickable, allowing an immediate playback jump to that time in the video.

What are the CAE Services?

CAE Services (Content Analysis Engine) expose a cutting-edge suite of processing and metadata generation routines over simple HTTP interfaces so that it can be deployed "in the cloud" allowing the services to accomodate large-scale applications with ease. The illustration bellow provides a high-level example of the benefits and applications that the CAE Services provide.

CAE Services Overview

Many of the CAE Services have been defined to operate on a very granular level (usually single images or other and low-bandwidth resources) so that they can distribute the request efficiently and respond with minimal latency. Other services, like CIS have been created to accomodate general video files, which may require lengthy processing times and large data transfers. Some functions of the CAE services are described below.

  • Data Representations: visual biometrics, semantic concepts, various low-level features, etc.
  • Retrieval: supported by textual search, semantic concepts, and even content-based copy detection discover a set of relevant multimedia clips or events (referred to as documents in the information retrieval community)
  • Intelligent Textual Mapping: common textual operations using natural language processing (NLP) tools to aide in the retrieval of content,

Technical Documents

AT&T Research at TRECVID 2011
Eric Zavesky, Zhu Liu, Behzad Shahraray, Ning Zhou
TRECVID Workshop,  2011.  [PDF]  [BIB]

NIST Copyright

LipActs: Efficient Representations For Visual Speakers
Eric Zavesky
IEEE ICME,  2011.  [PDF]  [BIB]

IEEE Copyright

Eric Zavesky, Behzad Shahraray, Zhu Liu, Neela Sawant
TRECVID 2010 Workshop,  2010.  [PDF]  [BIB]

NIST Copyright

Project Members

Lee Begeja

David Gibbon

Raghuraman Gopalan

Zhu Liu

Bernard Renger

Yadong Mu

Behzad Shahraray

Eric Zavesky

Related Projects

Project Space

AT&T Application Resource Optimizer (ARO) - For energy-efficient apps

Assistive Technology

CHI Scan (Computer Human Interaction Scan)

CoCITe – Coordinating Changes in Text

Connecting Your World



E4SS - ECharts for SIP Servlets

Scalable Ad Hoc Wireless Geocast

AT&T 3D Lab

Graphviz System for Network Visualization

Information Visualization Research - Prototypes and Systems

Swift - Visualization of Communication Services at Scale

Smart Grid

Speech Mashup

Omni Channel Analytics

Speech translation

StratoSIP: SIP at a Very High Level


Content Augmenting Media (CAM)

Content-Based Copy Detection

Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)

Content Analytics - distill content into visual and statistical representations

Social TV - View and Contribute to Public Opinions about Your Content Live

Visual API - Visual Intelligence for your Applications

Enhanced Indexing and Representation with Vision-Based Biometrics

Visual Semantics for Intuitive Mid-Level Representations

eClips - Personalized Content Clip Retrieval and Delivery

iMIRACLE - Content Retrieval on Mobile Devices with Speech

AT&T WATSON (SM) Speech Technologies

Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization