
200 S Laurel Ave - Bldg A
Middletown, NJ
Connecting Your World,
The need to be connected is greater than ever, and AT&T Researchers are creating new ways for people to connect with one another and with their environments, whether it's their home, office, or car.
Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT),
CONSENT provides a platform for detection and analysis of content for security and quality assurance.
Content Augmenting Media (CAM),
Leverage multimedia metadata to provide live alerts and intelligent content consumption.
Content-Based Copy Detection,
Content-based Copy Detection is an enabling technology to discover repeated content and events in a large-scale content database.
eClips - Personalized Content Clip Retrieval and Delivery,
The eClips project delivers customized video content based upon user profiles, based upon the MIRACLE platform.
Enhanced Indexing and Representation with Vision-Based Biometrics,
Leveraging visual biometrics for indexing and representations of content for retrieval and verification.
iMIRACLE - Content Retrieval on Mobile Devices with Speech,
iMIRACLE uses large vocabulary speech recognition for content retrieval with metadata words (titles, genre, channels, etc.) and content words that occur in recorded programs.
MIRACLE and the Content Analysis Engine (CAE),
The Multimedia Information Retrieval by Content (MIRACLE) project encompasses the technologies for video indexing, analysis, and retrieval with audio, textual, and visual content information.
Social TV - View and Contribute to Public Opinions about Your Content Live,
Social TV - View and Contribute to Public Opinions about Your Content Live (to be revised)
VidCat - Simplified Personal Photo and Video Managmenet,
VidCat permits simplified personal photo and video management (i.e. a Video Catalog) from a webpage or your favorite mobile device.
Video - Content Delivery and Consumption,
A background on the delivery and consumption of video and multimedia and references to projects within the AT&T Video and Multimedia Technologies and Services Research Department.
Video - Indexing and Representation (Metadata),
Video and multimedia indexing and representations (i.e. metadata), their production, and use. Links to projects within the AT&T Video and Multimedia Technologies and Services Research Department.
Video and Multimedia Technologies Research,
The AT&T Video and Multimedia Technologies Research Department strives to acquire multimedia and video for indexing,retrieval,and consumption with textual,semantic,and visual modalities.
Visual Semantics for Intuitive Mid-Level Representations,
Represent content with mid-level visual semantics for retrieval, filtering, and tagging.

Large-Scale Analysis for Interactive Media Consumption
David Gibbon, Andrea Basso, Lee Begeja, Zhu Liu, Bernard Renger, Behzad Shahraray, Eric Zavesky
TV Content Analysis,
TV Content Analysis,
CRC Press,
2012.
[PDF]
[BIB]
CRC Press, Taylor Francis LLC Copyright
The definitive version was published in proceedings of TV Content Analysis (CRC Press/Taylor & Francis). , 2012-02-22, http://mklab.iti.gr/tvca/
Over the years the fidelity and quantity of TV content has steadily increased, but consumers
are still experiencing considerable difficulties in finding the content matching their personal
interests. New mobile and IP consumption environments have emerged with the promise
of ubiquitous delivery of desired content, but in many cases, available content descriptions
in the form of electronic program guides lack sufficient detail and cumbersome human interfaces yield a less than positive user experience. Creating metadata through a detailed
manual annotation of TV content is costly and, in many cases, this metadata may be lost
in the content life-cycle as assets are repurposed for multiple distribution channels. Content
organization can be daunting when considering domains from breaking news contributions,
local or government channels, live sports, music videos, documentaries up through dramatic
series and feature films. As the line between TV content and Internet content continues
to blur, more and more long tail content will appear on TV and the ability to be able to
automatically generate metadata for it becomes paramount. Research results from several
disciplines must be brought together to address the complex challenge of cost effectively augmenting existing content descriptions to facilitate content personalization and adaptation for
users given todays range of content consumption contexts.This chapter presents systems architectures for processing large volumes of video efficiently, practical, state of the art solutions for TV content analysis and metadata generation,
and potential applications that utilize this metadata in effective and enabling ways.

Combining Content Analysis of Television Programs with Audience Measurement
David Gibbon, Zhu Liu, Eric Zavesky, DeDe Paul, Deborah Swayne, Rittwik Jana, Behzad Shahraray
IEEE Consumer Communication and Networking Conference, (CCNC),
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE Consumer Communications and Networking Conference. , 2012-01-15
The definitive version was published in Advertising Research Foundation. , 2012-01-15
Combining content analysis of television programs with quantitative audience measurement can provide insights
into customer reactions to advertisements and program content. This work introduces a system architecture that
incorporates anonymous audience metrics from an operational IPTV environment with metadata from a content-based
analysis of recorded programs. Evaluated on a collection of news programs, the system verifies that events derived
from the audience metrics data stream correspond to media segmentation boundaries such as commercial breaks and
topic changes. An automated system for executing multimodal media segmentation algorithms for commercial break and
topic change detection is also discussed. Better understanding of audience reaction can help IPTV service providers
plan infrastructure investments and help in managing multimedia content delivery networks.

Automated Content Metadata Extraction Services based on MPEG Standards
David Gibbon, Zhu Liu, Andrea Basso, Behzad Shahraray
The Computer Journal,
The Computer Journal, Special Issue on MPEG Applications and Services,
2012.
[PDF]
[BIB]
Oxford University Press Copyright
The definitive version was published in 2012. , 2012-12-06, http://www.oxfordjournals.org
This paper is concerned with the generation, acquisition, standardized representation, and transport of video metadata.
The use of MPEG standards in the design and development of interoperable media architectures and Web services is discussed.
A high-level discussion of several algorithms for metadata extraction is presented.
Some architectural and algorithmic issues encountered when designing services for real-time processing of video streams,
as opposed to traditional offline media processing, are addressed.
A prototype real-time video analysis system for generating MPEG-7 Audiovisual Description Profile (AVDP) from MPEG-2
transport stream encapsulated video is presented.
Such a capability can enable a range of new services such as content-based personalization of live broadcasts given that the
MPEG-7 based data models fit in well with specifications for advanced television services such as
TV-Anytime and Alliance of Telecommunications Industry Solutions (ATIS) IPTV Interoperability Forum.

AT&T Research at TRECVID 2011
Eric Zavesky, Zhu Liu, Behzad Shahraray, Ning Zhou
TRECVID Workshop,
2011.
[PDF]
[BIB]
NIST Copyright
The definitive version was published in TRECVID Workshop. , 2011-12-07, http://www-nlpir.nist.gov/projects/tv2011/notebookpapers.html
{AT&T participated in two tasks at TRECVID 2011: content- based copy detection (CCD) and instance-based search (INS). The CCD system developed for TRECVID 2010 was en- hanced for speed and augmented with an additional picture- in-picture detector and alternative audio features. As a pilot task, participation in INS evaluated object-level content- based copy detection and created a basis for integer-score result reranking. This paper reports the enhancements of the CCD system and briefly describe its application to INS for object-level copy detection.}
AT&T RESEARCH AT TRECVID 2010
Eric Zavesky, Behzad Shahraray, Zhu Liu, Neela Sawant
TRECVID 2010 Workshop,
2010.
[PDF]
[BIB]
NIST Copyright
The definitive version was published in TRECVID 2010 , 2010-11-15
{AT&T participated in two tasks at TRECVID 2010: content- based copy detection (CCD) and instance-based search (INS). The CCD system developed for TRECVID 2009 was en- hanced for efficiency and scale and was augmented by audio features. As a pilot task, participation in INS was meant to evaluate a number of algorithms traditionally used for search in a fully automated setting. In this paper, we report the enhancement of our CCD system and propose a system for INS that attempts to leverage retrieval techniques from different audio, video, and textual cues.}
System And Method For Categorizing Long Documents,
Tue Aug 28 16:11:36 EDT 2012
A system, a method, an apparatus, and a computer-readable medium are provided. Each of a group of documents is segmented. Categories are assigned to each segment of the group of documents. A categorization series for each one of the group of documents is formed, based at least in part, on the categories assigned to each of the segments of respective ones of the plurality of documents. A pattern is found based, at least in part, on the plurality of categorization series corresponding to the plurality of documents. Each of the group of documents is categorized based, at least in part, on the pattern.
System And Method For Adaptive Media Playback Based On Destination,
Tue Aug 07 16:11:22 EDT 2012
Disclosed herein are systems, methods, and computer readable-media for adaptive media playback based on destination. The method for adaptive media playback comprises determining one or more destinations, collecting media content that is relevant to or describes the one or more destinations, assembling the media content into a program, and outputting the program. In various embodiments, media content may be advertising, consumer-generated, based on real-time events, based on a schedule, or assembled to fit within an estimated available time. Media content may be assembled using an adaptation engine that selects a plurality of media segments that fit in the estimated available time, orders the plurality of media segments, alters at least one of the plurality of media segments to fit the estimated available time, if necessary, and creates a playlist of selected media content containing the plurality of media segments.
Brief And High-Interest Video Summary Generation,
Tue Jun 05 16:10:38 EDT 2012
A video is summarized by determining if a video contains one or more junk frames, modifying one or more boundaries of shots of the video based at least in part on the determination of if the video contains one or more junk frames, sampling a plurality of the shots of the video into a plurality of subshots, clustering the plurality of subshots with a multiple step k-means clustering, and creating a video summary based at least in part on the clustered plurality of subshots. The video is segmented into a plurality of shots and a keyframe from each of the plurality of shots is extracted. A video summary is created based on a determined importance of the subshots in a clustered plurality of subshots and a time budget. The created video summary is rendered by displaying playback rate information for the rendered video summary, displaying a currently playing subshot marker with the rendered video summary, and displaying an indication of similar content in the rendered video summary.
Environment Delivery Network,
Tue Apr 17 16:10:07 EDT 2012
A method for environmental delivery network prioritizes groups of data for transmission based on a various factors such as synchronization requirements, endpoint configuration, and the fidelity of sensory stimuli reproduction. A device detects data missing from a group of data received from a server and replaces the missing data with replacement data based on a predetermined value. The predetermined value may be based on a default value specific to the sensory stimulus missing data, data received prior to the missing data, or data received prior to and after the missing data.
Method And System For Embedding Information Into Streaming Media,
Tue Apr 03 16:09:52 EDT 2012
A method and system for embedding information into streaming media is disclosed. In order to avoid a prolonged waiting period between the time a video stream is selected for viewing and the time it actually begins to play, information relevant to the content of the video stream is independently obtained and locally stored. This information may be advertising, text, games or any other media which may be of interest to the user. The information is embedded into the video or other media stream to be presented to the viewer and played immediately, so that the user avoids any wait time in viewing the selected stream that may occur due to bandwidth shortages or other system considerations.
System And Method For Automated Multimedia Content Indexing And Retrieval,
Tue Mar 06 16:09:28 EST 2012
The invention provides a system and method for automatically indexing and retrieving multimedia content. The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary.
Customized Interface Based On Viewed Programming,
Tue Nov 08 16:06:26 EST 2011
In one embodiment, a system generates a customized interface based on viewed programming. The system stores a program that a user viewed through a media device; searches through a network for information related to the viewed program; and extracts data associated with the information related to the viewed program. A custom interface is generated based substantially on the data associated with the information related to the viewed program.
Browsing And Retrieval Of Full Broadcast-Quality Video,
Tue Jan 25 16:04:24 EST 2011
A method includes steps of indexing a media collection, searching an indexed library and browsing a set of candidate program segments. The step of indexing a media collection creates the indexed library based on a content of the media collection. The step of searching the indexed library identifies the set of candidate program segments based on a search criteria. The step of browsing the set of candidate program segments selects a segment for viewing.
On-Demand Language Translation For Television Programs,
Tue Oct 05 15:04:54 EDT 2010
In an embodiment, a method of providing an on demand translation service is provided. A subscriber may be charged a reduced fee or no fee for use of the on demand translation service in exchange for displaying commercial messages to the subscriber, the commercial messages being selected based on subscriber information. A multimedia signal including information in a source language may be received. The information may be obtained as text in the source language from the multimedia signal. The text may be translated from the source language to a target language. Translated information, based on the translated text, may be transmitted to a processing device for presentation to the subscriber. The received multimedia signal may be sent to a multimedia device for viewing.
Systems And Methods For Monitoring Speech Data Labelers,
Tue May 04 15:03:49 EDT 2010
Systems and methods for using an annotation guide to label utterances and speech data with a call type. A method embodiment monitors labelers of speech data by presenting via a processor a test utterance to a labeler, receiving input from the labeler that selects a particular call type from a list of call types and determining via the processor if the labeler labeled the test utterance correctly. Based on the determining step, the method performs at least one of the following: revising the annotation guide, retraining the labeler or altering the test utterance.
On-Demand Language Translation For Television Programs,
Tue May 04 15:03:47 EDT 2010
A method, a system and a machine-readable medium are provided for an on demand translation service. A translation module including at least one language pair module for translating a source language to a target language may be made available for use by a subscriber. The subscriber may be charged a fee for use of the requested on demand translation service or may be provided use of the on demand translation service for free in exchange for displaying commercial messages to the subscriber. A video signal may be received including information in the source language, which may be obtained as text from the video signal and may be translated from the source language to the target language by use of the translation module. Translated information, based on the translated text, may be added into the received video signal. The video signal including the translated information in the target language may be sent to a display device.
Method For Providing A Compressed Rendition Of A Video Program In A Format Suitable For Electronic
Searching And Retrieval,
Tue Feb 02 15:03:24 EST 2010
A compressed rendition of a video program is provided in a format suitable for electronic searching and retrieval. An electronic pictorial transcript representation of the video program is initially received. The video program has a video component and a second information-bearing media component associated therewith. The pictorial transcript representation includes a representative frame from each segment of the video component of the video program and a portion of the second media component associated with the segment. The electronic pictorial transcript is transformed into a hypertext format to form a hypertext pictorial transcript. The hypertext pictorial transcript is subsequently recorded in an electronic medium.
Systems and methods for monitoring speech data labelers,
Tue Oct 09 18:12:17 EDT 2007
Systems and methods for monitoring labelers of speech data. To test or train labelers, a labeler is presented with utterances that have already been identified as belonging to a particular class or call type. The labeler is asked to assign a call type to the utterances. The performance of the labeler is measured by comparing the call types assigned by the labeler with the existing call types of the utterances. The performance of a labeler can also be monitored as the labeler labels speech data by occasionally having the labeler label an utterance that is already labeled and by storing the results.
Systems and methods for generating an annotation guide,
Tue May 15 18:12:01 EDT 2007
Systems and methods for generating an annotation guide. Speech data is organized and presented to a user. After the user selects some of the utterances in the speech data, the selected utterances are included in a class and/or call type. Additional utterances that belong to the class and/or call type can be found in the speech data using relevance feedback, data mining, data clustering, support vector machines, and the like. After a call type is complete, it is committed to the annotation guide. After all call types are completed, the annotation guide is generated.
System and method for automated multimedia content indexing and retrieval,
Tue Feb 27 18:11:55 EST 2007
The invention provides a system and method for automatically indexing and retrieving multimedia content. The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary.
System and method for automated multimedia content indexing and retrieval,
Tue Mar 30 18:09:42 EST 2004
The invention provides a system and method for automatically indexing and retrieving multimedia content. The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary.
Method For Providing A Compressed Rendition Of A Video Program In A Format Suitable For Electronic Searching And Retrieval,
Tue Jun 17 18:08:46 EDT 2003
A compressed rendition of a video program is provided in a format suitable for electronic searching and retrieval. An electronic pictorial transcript representation of the video program is initially received. The video program has a video component and a second information-bearing media component associated therewith. The pictorial transcript representation includes a representative frame from each segment of the video component of the video program and a portion of the second media component associated with the segment. The electronic pictorial transcript is transformed into a hypertext format to form a hypertext pictorial transcript. The hypertext pictorial transcript is subsequently recorded in an electronic medium.
Method For Analyzing Video,
Tue Apr 01 18:08:39 EST 2003
A method and system for recognizing scene changes in digitized video is based on using one-dimensional projections from the recorded video. Wavelet transformation is applied on each projection to determine the high frequency components. These components are then auto-correlated and a time-based curve of the autocorrelation coefficients is generated. A decision is made to define a scene change when the autocorrelation coefficient curves are greater than a predetermined value.
Method and apparatus for compressing a sequence of information-bearing frames having at least two media,
Tue Aug 07 18:07:11 EDT 2001
An apparatus and method for compressing a sequence of frames having at least first and second information-bearing media components selects a plurality of representative frames from among the sequence of frames. The representative frames represent information contained in the first information-bearing media component. A correspondence is then formed between each of the representative frames and one of a plurality of segments of the second information-bearing media component. The representative frames, the plurality of segments of the second information-bearing media component and the correspondence between them are recorded for subsequent retrieval. If the first information-bearing media component is a video component composed of a plurality of scenes, a representative frame may be selected from each scene. Additionally, if the second information-bearing media component is a closed-caption component, a printed rendition of the representative frames and the closed-caption component may be provided. The printed rendition constitutes a pictorial transcript in which each representative frame is printed with a caption containing the closed-caption text associated therewith.
Method for automatically providing a compressed rendition of a video program in a format suitable for electronic searching and retrieval,
Tue Aug 01 18:05:35 EDT 2000
A compressed rendition of a video program is provided in a format suitable for electronic searching and retrieval. An electronic pictorial transcript representation of the video program is initially received. The video program has a video component and a second information-bearing media component associated therewith. The pictorial transcript representation includes a representative frame from each segment of the video component of the video program and a portion of the second media component associated with the segment. The electronic pictorial transcript is transformed into a hypertext format to form a hypertext pictorial transcript. The hypertext pictorial transcript is subsequently recorded in an electronic medium.
Method and means for detecting people in image sequences,
Tue Nov 16 18:05:25 EST 1999
The head in a series of video images is identified by digitizing sequential images, subtracting a previous image from an input image to determine moving objects, calculating boundary curvature extremes of regions in the subtracted image, comparing the extremes with a stored model of a human head to find regions shaped like a human head, and identifying the head with a surrounding shape.
Method For Communicating Audiovisual Programs Over A Communication Network,
Tue Feb 23 01:05:20 EST 1999
This patent relates to on-demand streaming of media over IP networks. The patent discloses a method that supports visual browsing of media streams and is primarily intended for video and illustrated audio stream types. There is an increasing amount of rich media on the web and broadband access is making it available to larger numbers of people. New methods for searching and browsing of rich media over IP networks are required in order to fully exploit these trends. In streaming media systems, a prefetch buffer is maintained on the client to compensate for network jitter. Navigating around a stored media clip is difficult due to the time required to refill the buffer after seek operations. With the current invention, the network bandwidth is managed to not only send the data streams for basic buffering, but also to transmit additional information needed for stream navigation. This additional information is loaded non-sequentially using either UDP or TCP protocols for data transport. In addition to this concept, the patent further discloses an optimal buffer control algorithm that selects the best representative image set for any given time during the streaming session. In comparison with previously existing methods, the new method offers a much more interactive environment for simultaneous streaming and browsing of visual media.
Method and apparatus for recording and indexing an audio and multimedia conference,
Tue Jan 20 18:05:04 EST 1998
A method and apparatus for recording and indexing audio information exchanged during an audio conference call, or video, audio and data information exchanged during a multimedia conference. For a multimedia conference, the method and apparatus utilize the voice activated switching functionality of a multipoint control unit (MCU) to provide a video signal, which is input to the MCU from a workstation from which an audio signal is detected, to each of the other workstations participating in the conference. A workstation and/or participant-identifying signal generated by the multipoint control unit is stored, together or in correspondence with the audio signal and video information, for subsequent ready retrieval of the stored multimedia information. For an audio conference, a computer is connected to an audio bridge for recording the audio information along with an identification signal for correlating each conference participant with that participant's statements.
Science & Technology Medal, 2005.
Honored for technical leadership in content-based indexing, searching and browsing, and display of multimedia data.