
200 S Laurel Ave - Bldg A
Middletown, NJ
Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT),
CONSENT provides a platform for detection and analysis of content for security and quality assurance.
Content Augmenting Media (CAM),
Leverage multimedia metadata to provide live alerts and intelligent content consumption.
iMIRACLE - Content Retrieval on Mobile Devices with Speech,
iMIRACLE uses large vocabulary speech recognition to recognize metadata words (titles, genre, channels, etc.) and content words that occur in recorded programs.
MIRACLE and the Content Analysis Engine (CAE),
The MIRACLE project develops media processing technologies that enable the content-based retrieval and presentation of multimedia data over a range of devices, and a wide range of available bandwidth.
Video - Content Delivery and Consumption,
A background on the delivery and consumption of video and multimedia and references to projects within the AT&T Video and Multimedia Technologies and Services Research Department.
Video - Indexing and Representation (Metadata),
Video and multimedia indexing and representations (i.e. metadata), their production, and use. Links to projects within the AT&T Video and Multimedia Technologies and Services Research Department.
Video and Multimedia Technologies and Services Research,
The AT&T Video and Multimedia Technologies and Services Research Department strives to acquire multimedia and video for indexing,retrieval,and consumption with textual,semantic,and visual modalities.

Large-Scale Analysis for Interactive Media Consumption
David Gibbon, Andrea Basso, Lee Begeja, Zhu Liu, Bernard Renger, Behzad Shahraray, Eric Zavesky
TV Content Analysis,
TV Content Analysis,
CRC Press,
2012.
[PDF]
[BIB]
CRC Press, Taylor Francis LLC Copyright
The definitive version was published in proceedings of TV Content Analysis (CRC Press/Taylor & Francis). , 2012-02-22, http://mklab.iti.gr/tvca/
Over the years the fidelity and quantity of TV content has steadily increased, but consumers
are still experiencing considerable difficulties in finding the content matching their personal
interests. New mobile and IP consumption environments have emerged with the promise
of ubiquitous delivery of desired content, but in many cases, available content descriptions
in the form of electronic program guides lack sufficient detail and cumbersome human interfaces yield a less than positive user experience. Creating metadata through a detailed
manual annotation of TV content is costly and, in many cases, this metadata may be lost
in the content life-cycle as assets are repurposed for multiple distribution channels. Content
organization can be daunting when considering domains from breaking news contributions,
local or government channels, live sports, music videos, documentaries up through dramatic
series and feature films. As the line between TV content and Internet content continues
to blur, more and more long tail content will appear on TV and the ability to be able to
automatically generate metadata for it becomes paramount. Research results from several
disciplines must be brought together to address the complex challenge of cost effectively augmenting existing content descriptions to facilitate content personalization and adaptation for
users given todays range of content consumption contexts.This chapter presents systems architectures for processing large volumes of video efficiently, practical, state of the art solutions for TV content analysis and metadata generation,
and potential applications that utilize this metadata in effective and enabling ways.

Q-score: Proactive Service Quality Assessment in a Large IPTV System
Jia Wang, Zihui Ge, Jennifer Yates, Ajay Mahimkar, Andrea Basso, Min Chen, Han Hee Song, Yin Zhang
ACM Internet Measurement Conference,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Internet Measurement Conference. , 2011-11-02.
{In large-scale IPTV systems, it is essential to
maintain high service quality while providing a wider variety
of service features than typical traditional TV. Thus
service quality assessment systems are of paramount importance
as they monitor the user-perceived service quality and
alert when issues occurs. For IPTV systems, however, there
is no simple metric to represent user-perceived service quality
and Quality of Experience (QoE).Moreover, there is only
limited user feedback, often in the form of noisy and delayed
customer complaints. Therefore, we aim to approximate the
QoE through a selected set of performance indicators in a
proactive (i.e., detect issues before customers complain) and
scalable fashion.
In this paper, we present service quality assessment framework,
Q-score, which accurately learns a small set of performance
indicators most relevant to user-perceived service
quality, and proactively infers service quality in a single score.
We evaluate Q-score using network data collected from a
commercial IPTV service provider and show that Q-score is
able to predict 60% of service problems that are complained
by customerswith 0.1%of false positives. ThroughQ-score,
we have (i) gained insight into various types of service problems
causing user dissatisfaction includingwhy users tend to
react promptly to sound issues while late to video issues; (ii)
identified and quantified the opportunity to proactively detect
the service quality degradation of individual customers
before severe performance impact occurs; and (iii) observed
possibility to adaptively allocate customer care workforce to
potentially troubling service areas.}

Video email for the digital set-top box
Allen Milewski, Thomas Smith, David Weimer, Baldine Paul, Glenn Cash, Andrea Basso
HCI International Conference Proceedings, 2001,
2001.
[BIB]
{We describe a prototype, network-based, video email service targeted to run on a variety of thin clients from PCs to digital cable set-top boxes. Email is an attractive starting place for video in the home: (i) it is a familiar means of communicating for many, (ii) its privacy concerns are smaller than for interactive video, and (iii) its "critical mass" requirements may be small since only senders need complete video systems. Nonetheless, most video email systems are still cumbersome in terms of hardware, storage and installation processes. In contrast, our system architecture requires little client storage since the video is streamed to a network-based media server while it is being recorded. The email recipient receives a message that contains a reference to the network-stored video file, and it is streamed down for viewing. We describe the User Experience design challenges associated with implementing the system for digital cable set-top boxes. Publisher: Lawrence Erlbaum and Associates Pub/Conf: HCI International Conference Proceedings, 2001 }
Interaction Modalities For Multimedia Delivery And Presentation,
Tue Sep 11 12:53:37 EDT 2012
A method and apparatus for displaying received data, analyze the quality of the displayed data formulating a media-parameter suggestion for the encoder to alter the characteristics of data to be sent to the receiver, and sending from the receiver, the formulated suggestion.
System And Method For Sharing Information Between A Concierge And Guest,
Tue Aug 21 12:53:20 EDT 2012
A novel mechanism is disclosed by which a sender can direct information such as an audiovisual signal to a particular recipient's audiovisual display device, such as a cable television set and, thereby, share information between the sender and the recipient. In one embodiment of the invention, a calling party originates a telephone call and associates that telephone call with audio-visual information that exists on the caller's personal computer or on an Internet server. The called party answers the call, and can tune an associated cable television to the appropriate channel in order to view the audio-visual information. In another embodiment, the caller is a hotel guest and the called party is a hotel concierge and vice versa. The concierge provides information to the hotel guest such that the hotel guest can tune in to a channel on their hotel television set and access the information.
System And Method For Adaptive Media Playback Based On Destination,
Tue Aug 07 12:53:09 EDT 2012
Disclosed herein are systems, methods, and computer readable-media for adaptive media playback based on destination. The method for adaptive media playback comprises determining one or more destinations, collecting media content that is relevant to or describes the one or more destinations, assembling the media content into a program, and outputting the program. In various embodiments, media content may be advertising, consumer-generated, based on real-time events, based on a schedule, or assembled to fit within an estimated available time. Media content may be assembled using an adaptation engine that selects a plurality of media segments that fit in the estimated available time, orders the plurality of media segments, alters at least one of the plurality of media segments to fit the estimated available time, if necessary, and creates a playlist of selected media content containing the plurality of media segments.
Brief And High-Interest Video Summary Generation,
Tue Jun 05 12:52:23 EDT 2012
A video is summarized by determining if a video contains one or more junk frames, modifying one or more boundaries of shots of the video based at least in part on the determination of if the video contains one or more junk frames, sampling a plurality of the shots of the video into a plurality of subshots, clustering the plurality of subshots with a multiple step k-means clustering, and creating a video summary based at least in part on the clustered plurality of subshots. The video is segmented into a plurality of shots and a keyframe from each of the plurality of shots is extracted. A video summary is created based on a determined importance of the subshots in a clustered plurality of subshots and a time budget. The created video summary is rendered by displaying playback rate information for the rendered video summary, displaying a currently playing subshot marker with the rendered video summary, and displaying an indication of similar content in the rendered video summary.
Active Intelligent Content,
Tue May 15 12:52:05 EDT 2012
Active intelligent content is aware of its own timeline, lifecycle, capabilities, limitations, and related information. The active intelligent content is aware of its surroundings and can convert automatically into a format or file type more conducive to the device or environment it is stored in. If the active intelligent content does not have the required tools to make such a transformation, it is self-aware enough to seek out the tools and/or information to make that transformation. Such active intelligent content can be used for enhanced file portability, target advertising, personalization of media, and selective encryption, enhancement, and restriction. The content can also be used to collaborate with other content and provide users with enhanced information based on user preferences, ratings, costs, genres, file types, and the like.
System And Method Of Organizing Data To Facilitate Access And Streaming,
Tue Oct 25 16:06:19 EDT 2011
File formats systems and methods are disclosed that provide a framework that integrates concepts, such as objects based audio-visual representation, meta-data and object oriented programming, to achieve a flexible and generic representation of the audiovisual information and the associated methods to operate on the audiovisual information. A system and method are disclosed for storing data processed from presentation data. The data is stored according to a method comprising coding input presentation data by identifying objects from within the presentation data, coding each object individually and organizing the coded data into access layer data units. The access layer data units are stored throughout a plurality of segments, each segment comprising a segment table in a header portion thereof and those access layer data units that are members of the respective segment, there being one entry in the segment table for each access layer data unit therein. A plurality of extended segments are also stored, each of the extended segments further comprising one or more of the access layer data units that include protocol specific data, the extended segments each represented by a extended segment header. The data of an accessible object is also stored, including an accessible object header and identifiers of the plurality of extended segments, each of the extended segments being a member of the same object.
System And Method For Generating Coded Video Sequences From Still Media,
Tue Aug 09 16:05:51 EDT 2011
The invention provides a system and method that transforms a set of still/motion media (i.e., a series of related or unrelated still frames, web-pages rendered as images, or video clips) or other multimedia, into a video stream that is suitable for delivery over a display medium, such as TV, cable TV, computer displays, wireless display devices, etc. The video data stream may be presented and displayed in real time or stored and later presented through a set-top box, for example. Because these media are transformed into coded video streams (e.g. MPEG-2, MPEG-4, etc.), a user can watch them on a display screen without the need to connect to the Internet through a service provider. The user may request and interact with the desired media through a simple telephone interface, for example. Moreover, several wireless and cable-based services can be developed on the top of this system. In one possible embodiment, the system for generating a coded video sequence may include an input unit that receives the multimedia input and extracts image data, and derives the virtual camera scripts and coding hints from the image data, a video sequence generator that generates a video sequence based on the extracted image data and the derived virtual camera scripts and coding hints, and a video encoder that encodes the generated video sequence using the coding hints and outputs the coded video sequence to an output device. The system may also provide customized video sequence generation services to subscribers.
Browsing And Retrieval Of Full Broadcast-Quality Video,
Tue Jan 25 16:04:24 EST 2011
A method includes steps of indexing a media collection, searching an indexed library and browsing a set of candidate program segments. The step of indexing a media collection creates the indexed library based on a content of the media collection. The step of searching the indexed library identifies the set of candidate program segments based on a search criteria. The step of browsing the set of candidate program segments selects a segment for viewing.
Method And System For Aligning Natural And Synthetic Video To Speech Synthesis,
Tue Nov 30 15:05:08 EST 2010
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously--text and Facial Animation Parameters. A Text-To-Speech converter drives the mouth shapes of the face. An encoder sends Facial Animation Parameters to the face. The text input can include codes, or bookmarks, transmitted to the Text-to-Speech converter, which are placed between and inside words. The bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. The Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp and a real-time time stamp. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
Digitally-Generated Lighting For Video Conferencing Applications,
Tue Sep 28 15:04:49 EDT 2010
A method of improving the lighting conditions of a real scene or video sequence. Digitally generated light is added to a scene for video conferencing over telecommunication networks. A virtual illumination equation takes into account light attenuation, lambertian and specular reflection. An image of an object is captured, a virtual light source illuminates the object within the image. In addition, the object can be the head of the user. The position of the head of the user is dynamically tracked so that an three-dimensional model is generated which is representative of the head of the user. Synthetic light is applied to a position on the model to form an illuminated model.
System And Method For Adaptive Content Rendition,
Tue Sep 14 15:04:37 EDT 2010
Disclosed herein are systems, methods, and computer readable-media for adaptive content rendition, the method comprising receiving media content for playback to a user, adapting the media content for playback on a first device in the user's first location, receiving a notification when the user changes to a second location, adapting the media content for playback on a second device in the second location, and transitioning media content playback from the first device to second device. One aspect conserves energy by optionally turning off the first device after transitioning to the second device. Another aspect includes playback devices that are "dumb devices" which receive media content already prepared for playback, "smart devices" which receive media content in a less than ready form and prepare the media content for playback, or hybrid smart and dumb devices. A single device may be substituted by a plurality of devices. Adapting the media content for playback is based on a user profile storing user preferences and/or usage history in one aspect.
Method And System For Aligning Natural And Synthetic Video To Speech Synthesis,
Tue Sep 01 16:07:55 EDT 2009
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously--text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
System and method of organizing data to facilitate access and streaming,
Tue Sep 23 18:13:01 EDT 2008
File formats systems and methods are disclosed that provide a framework that integrates concepts, such as objects based audio-visual representation, meta-data and object oriented programming, to achieve a flexible and generic representation of the audiovisual information and the associated methods to operate on the audiovisual information. A system and method are disclosed for storing data processed from presentation data. The data is stored according to a method comprising coding input presentation data by identifying objects from within the presentation data, coding each object individually and organizing the coded data into access layer data units. The access layer data units are stored throughout a plurality of segments, each segment comprising a segment table in a header portion thereof and those access layer data units that are members of the respective segment, there being one entry in the segment table for each access layer data unit therein. A plurality of extended segments are also stored, each of the extended segments further comprising one or more of the access layer data units that include protocol specific data, the extended segments each represented by a extended segment header. The data of an accessible object is also stored, including an accessible object header and identifiers of the plurality of extended segments, each of the extended segments being a member of the same object.
Method and system for aligning natural and synthetic video to speech synthesis,
Tue Apr 29 18:12:46 EDT 2008
Facial animation in MPEG-4 can be driven by a text stream and a Facial Animation Parameters (FAP) stream. Text input is sent to a TTS converter that drives the mouth shapes of the face. FAPs are sent from an encoder to the face over the communication channel. Disclosed are codes bookmarks in the text string transmitted to the TTS converter. Bookmarks are placed between and inside words and carry an encoder time stamp. The encoder time stamp does not relate to real-world time. The FAP stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
Interaction modalities for multimedia delivery and presentation,
Tue Dec 18 18:12:32 EST 2007
A method and apparatus for displaying received data, analyze the quality of the displayed data formulating a media-parameter suggestion for the encoder to alter the characteristics of data to be sent to the receiver, and sending from the receiver, the formulated suggestion.
Digitally-generated lighting for video conferencing applications,
Tue Jun 12 18:12:06 EDT 2007
A method of improving the lighting conditions of a real scene or video sequence. Digitally generated light is added to a scene for video conferencing over telecommunication networks. A virtual illumination equation takes into account light attenuation, lambertian and specular reflection. An image of an object is captured, a virtual light source illuminates the object within the image. In addition, the object can be the head of the user. The position of the head of the user is dynamically tracked so that an three-dimensional model is generated which is representative of the head of the user. Synthetic light is applied to a position on the model to form an illuminated model.
Method and system for aligning natural and synthetic video to speech synthesis,
Tue Sep 19 18:11:34 EDT 2006
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously--text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
Digitally-generated lighting for video conferencing applications,
Tue Dec 27 18:10:43 EST 2005
A method of improving the lighting conditions of a real scene or video sequence. Digitally generated light is added to a scene for video conferencing over telecommunication networks. A virtual illumination equation takes into account light attenuation, lambertian and specular reflection. An image of an object is captured, a virtual light source illuminates the object within the image. In addition, the object can be the head of the user. The position of the head of the user is dynamically tracked so that an three-dimensional model is generated which is representative of the head of the user. Synthetic light is applied to a position on the model to form an illuminated model.
Scalable Video Encoder/Decoder With Drift Control,
Tue Nov 01 18:10:37 EST 2005
Scalable video coders have traditionally avoided using enhancement layer information to predict the base layer, so as to avoid so-called drift. As a result, they are less efficient than a one-layer coder. The present invention is directed to a scalable video coder that allows drift, by predicting the base layer from the enhancement layer information. Through careful management of the amount of drift introduced, the overall compression efficiency can be improved while only slighly degrading resilience for lower bit-rates.
Flexible interchange of coded multimedia facilitating access and streaming,
Tue Jun 15 18:09:52 EDT 2004
A fundamental limitation in the exchange of audiovisual information today is that its representation is extremely low level. It is composed of coded video or audio samples (often as blocks) arranged in a commercial format. In contrast, the new generation multimedia requires flexible formats to allow a quick adaptation to requirements in terms of access, bandwidth scalability, streaming as well as general data reorganization. The Flexible-Integrated Intermedia Format (Flexible-IIF or F-IIF) is an advanced extension to the Integrated Intermedia Format (IIF). The Flexible-Integrated Intermedia Format (Flexible-IIF) datastructures, file formats systems and methods provide a framework that integrates advanced concepts, such as objects based audio-visual representation, meta-data and object oriented programming, to achieve a flexible and generic representation of the audiovisual information and the associated methods to operate on the audiovisual information.
Flexible synchronization framework for multimedia streams,
Tue Aug 05 18:08:48 EDT 2003
A flexible framework for synchronization of multimedia streams synchronizes the incoming streams on the basis of the collaboration of a transmitter-driven and a local inter-media synchronization module. Whenever the first one it is not enough to ensure reliable synchronization or cannot assure synchronization because the encoder does not know the exact timing of the decoder, the second one comes into play. Normally, the transmitter-driven module uses the stream time stamps if their drift is acceptable. If the drift is too high, the system activates an internal inter-media synchronization mode while the transmitter driven module extracts the coarsest inter-media synchronization and/or the structural information present in the streams. The internal clock of the receiver is used as absolute time reference. Whenever the drift value stabilizes to acceptable values, the system switches back smoothly to the external synchronization mode. The switch has a given hysteresis in order to avoid oscillations between internal and external synchronization modes.
Method And System For Aligning Natural And Synthetic Video To Speech Synthesis,
Tue May 20 18:08:42 EDT 2003
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously--text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
Interaction modalities for multimedia delivery and presentation using nodes,
Tue Nov 13 18:07:15 EST 2001
A system and method for reproducing a multimedia data signal on a terminal. A terminal capability node is instantiated and the terminal capability is evaluated. The value of a capability is set, and the value is then altered based upon the capability of the evaluated terminal.
System and method for processing object-based audiovisual information,
Tue Sep 18 18:07:13 EDT 2001
Audiovisual data storage is enhanced using an expanded physical object table utilizing an ordered list of unique identifiers for a particular object for every object instance of an object contained in segments of a data file. Two object instances of the same object in the same segment have different object identifiers. Therefore, different instances of the same object use different identification and the different object instances may be differentiated from one another for access, editing and transmission. The necessary memory required for randomly accessing data contained in files using the expanded physical object table may be reduced by distributing necessary information within a header of a file to simplify the structure of the physical object table. In this way, a given object may be randomly accessed by means of an improved physical object table/segment object table mechanism.
Flexible synchronization framework for multimedia streams having inserted time stamp,
Tue Jan 23 18:06:57 EST 2001
A flexible framework for synchronization of multimedia streams synchronizes the incoming streams on the basis of the collaboration of a transmitter-driven and a local inter-media synchronization module. Whenever the first one it is not enough to ensure reliable synchronization or cannot assure synchronization because the encoder does not know the exact timing of the decoder, the second one comes into play. Normally, the transmitter-driven module uses the stream time stamps if their drift is acceptable. If the drift is too high, the system activates an internal inter-media synchronization mode while the transmitter driven module extracts the coarsest inter-media synchronization and/or the structural information present in the streams. The internal clock of the receiver is used as absolute time reference. Whenever the drift value stabilizes to acceptable values, the system switches back smoothly to the external synchronization mode. The switch has a given hysteresis in order to avoid oscillations between internal and external synchronization modes.