
200 S Laurel Ave - Bldg A
Middletown, NJ
MIRACLE,
The MIRACLE project develops media processing technologies that enable the content-based retrieval and presentation of multimedia data over a range of devices, and a wide range of available bandwidth.
Method And System For Aligning Natural And Synthetic Video To Speech Synthesis,
September 1, 2009
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously--text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
System and method of organizing data to facilitate access and streaming,
September 23, 2008
File formats systems and methods are disclosed that provide a framework that integrates concepts, such as objects based audio-visual representation, meta-data and object oriented programming, to achieve a flexible and generic representation of the audiovisual information and the associated methods to operate on the audiovisual information. A system and method are disclosed for storing data processed from presentation data. The data is stored according to a method comprising coding input presentation data by identifying objects from within the presentation data, coding each object individually and organizing the coded data into access layer data units. The access layer data units are stored throughout a plurality of segments, each segment comprising a segment table in a header portion thereof and those access layer data units that are members of the respective segment, there being one entry in the segment table for each access layer data unit therein. A plurality of extended segments are also stored, each of the extended segments further comprising one or more of the access layer data units that include protocol specific data, the extended segments each represented by a extended segment header. The data of an accessible object is also stored, including an accessible object header and identifiers of the plurality of extended segments, each of the extended segments being a member of the same object.
Method and system for aligning natural and synthetic video to speech synthesis,
April 29, 2008
Facial animation in MPEG-4 can be driven by a text stream and a Facial Animation Parameters (FAP) stream. Text input is sent to a TTS converter that drives the mouth shapes of the face. FAPs are sent from an encoder to the face over the communication channel. Disclosed are codes bookmarks in the text string transmitted to the TTS converter. Bookmarks are placed between and inside words and carry an encoder time stamp. The encoder time stamp does not relate to real-world time. The FAP stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
Interaction modalities for multimedia delivery and presentation,
December 18, 2007
A method and apparatus for displaying received data, analyze the quality of the displayed data formulating a media-parameter suggestion for the encoder to alter the characteristics of data to be sent to the receiver, and sending from the receiver, the formulated suggestion.
Digitally-generated lighting for video conferencing applications,
June 12, 2007
A method of improving the lighting conditions of a real scene or video sequence. Digitally generated light is added to a scene for video conferencing over telecommunication networks. A virtual illumination equation takes into account light attenuation, lambertian and specular reflection. An image of an object is captured, a virtual light source illuminates the object within the image. In addition, the object can be the head of the user. The position of the head of the user is dynamically tracked so that an three-dimensional model is generated which is representative of the head of the user. Synthetic light is applied to a position on the model to form an illuminated model.
Method and system for aligning natural and synthetic video to speech synthesis,
September 19, 2006
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously--text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
Digitally-generated lighting for video conferencing applications,
December 27, 2005
A method of improving the lighting conditions of a real scene or video sequence. Digitally generated light is added to a scene for video conferencing over telecommunication networks. A virtual illumination equation takes into account light attenuation, lambertian and specular reflection. An image of an object is captured, a virtual light source illuminates the object within the image. In addition, the object can be the head of the user. The position of the head of the user is dynamically tracked so that an three-dimensional model is generated which is representative of the head of the user. Synthetic light is applied to a position on the model to form an illuminated model.
Scalable Video Encoder/Decoder With Drift Control,
November 1, 2005
Scalable video coders have traditionally avoided using enhancement layer information to predict the base layer, so as to avoid so-called drift. As a result, they are less efficient than a one-layer coder. The present invention is directed to a scalable video coder that allows drift, by predicting the base layer from the enhancement layer information. Through careful management of the amount of drift introduced, the overall compression efficiency can be improved while only slighly degrading resilience for lower bit-rates.
Flexible interchange of coded multimedia facilitating access and streaming,
June 15, 2004
A fundamental limitation in the exchange of audiovisual information today is that its representation is extremely low level. It is composed of coded video or audio samples (often as blocks) arranged in a commercial format. In contrast, the new generation multimedia requires flexible formats to allow a quick adaptation to requirements in terms of access, bandwidth scalability, streaming as well as general data reorganization. The Flexible-Integrated Intermedia Format (Flexible-IIF or F-IIF) is an advanced extension to the Integrated Intermedia Format (IIF). The Flexible-Integrated Intermedia Format (Flexible-IIF) datastructures, file formats systems and methods provide a framework that integrates advanced concepts, such as objects based audio-visual representation, meta-data and object oriented programming, to achieve a flexible and generic representation of the audiovisual information and the associated methods to operate on the audiovisual information.
Flexible synchronization framework for multimedia streams,
August 5, 2003
A flexible framework for synchronization of multimedia streams synchronizes the incoming streams on the basis of the collaboration of a transmitter-driven and a local inter-media synchronization module. Whenever the first one it is not enough to ensure reliable synchronization or cannot assure synchronization because the encoder does not know the exact timing of the decoder, the second one comes into play. Normally, the transmitter-driven module uses the stream time stamps if their drift is acceptable. If the drift is too high, the system activates an internal inter-media synchronization mode while the transmitter driven module extracts the coarsest inter-media synchronization and/or the structural information present in the streams. The internal clock of the receiver is used as absolute time reference. Whenever the drift value stabilizes to acceptable values, the system switches back smoothly to the external synchronization mode. The switch has a given hysteresis in order to avoid oscillations between internal and external synchronization modes.
Method And System For Aligning Natural And Synthetic Video To Speech Synthesis,
May 20, 2003
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously--text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
Interaction modalities for multimedia delivery and presentation using nodes,
November 13, 2001
A system and method for reproducing a multimedia data signal on a terminal. A terminal capability node is instantiated and the terminal capability is evaluated. The value of a capability is set, and the value is then altered based upon the capability of the evaluated terminal.
System and method for processing object-based audiovisual information,
September 18, 2001
Audiovisual data storage is enhanced using an expanded physical object table utilizing an ordered list of unique identifiers for a particular object for every object instance of an object contained in segments of a data file. Two object instances of the same object in the same segment have different object identifiers. Therefore, different instances of the same object use different identification and the different object instances may be differentiated from one another for access, editing and transmission. The necessary memory required for randomly accessing data contained in files using the expanded physical object table may be reduced by distributing necessary information within a header of a file to simplify the structure of the physical object table. In this way, a given object may be randomly accessed by means of an improved physical object table/segment object table mechanism.
Flexible synchronization framework for multimedia streams having inserted time stamp,
January 23, 2001
A flexible framework for synchronization of multimedia streams synchronizes the incoming streams on the basis of the collaboration of a transmitter-driven and a local inter-media synchronization module. Whenever the first one it is not enough to ensure reliable synchronization or cannot assure synchronization because the encoder does not know the exact timing of the decoder, the second one comes into play. Normally, the transmitter-driven module uses the stream time stamps if their drift is acceptable. If the drift is too high, the system activates an internal inter-media synchronization mode while the transmitter driven module extracts the coarsest inter-media synchronization and/or the structural information present in the streams. The internal clock of the receiver is used as absolute time reference. Whenever the drift value stabilizes to acceptable values, the system switches back smoothly to the external synchronization mode. The switch has a given hysteresis in order to avoid oscillations between internal and external synchronization modes.