
180 Park Ave - Building 103
Florham Park, NJ
Subject matter expert in Natural Language Processing, Speech Processing, Language Modeling, Machine Translation, Question Answering, Parsing, Formal Grammars
Dr. Srinivas Bangalore is currently a Principal Member of Technical Specialist in the Voice and IP Services Laboratory at AT&T Labs-Research. He received his PhD in Computer Science from University of Pennsylvania in 1997. His dissertation on Supertagging was awarded the Morris and Dorothy Rubinoff award for outstanding dissertation that has resulted in or could lead to innovative applications of computer technology. He has been at AT&T Labs-Research since 1997 and has worked on many areas of natural language processing including Spoken Language Translation, Multimodal Understanding, Language Generation and Question-Answering. He has co-edited a book on Supertagging, authored over a 100 research publications and holds over 45 patents in these areas. Dr. Bangalore has been adjunct associate professor at Columbia University and a visiting lecturer at Pricenton University. He has been awarded the AT&T Outstanding Mentor Award, in recognition of his support and dedication to AT&T Labs Mentoring Program and the AT&T Science & Technology Medal for technical leadership and innovative contributions in Spoken Language Technology and Services. He has served as an editorial board member of Computational Linguistics Journal and a program committee member for a number of ACL and IEEE Speech conferences.
Connecting Your World,
The need to be connected is greater than ever, and AT&T Researchers are creating new ways for people to connect with one another and with their environments, whether it's their home, office, or car.
Speech translation,
AT&T Research is developing a real-time speech-to-speech translation technology so the translation starts as soon as speech is detected.
SPECTRA: A SPEECH-TO-SPEECH TRANSLATION SYSTEM IN THE CLOUD
Vivek Rangarajan sridhar, Srinivas Bangalore, Aura Jimenez, Laden Golipour, Prakash Kolan
IEEE International Conference on Emerging Signal Processing Applications,
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE International Conference on Emerging Signal Processing Applications. , 2012-01-12
{In this demonstration, we will present Spectra, a speech-to-speech (S2S) translation system in the cloud. Spectra comprises of an HMM-based large vocabulary continuous speech recognition (AT&T Watson Speech Recognizer), a phrase-based translation
system, and, a unit selection text-to-speech synthesis system (AT&T Natural Voices TTS). Spectra currently runs on any iOS device and can be downloaded as an application from the Apple application store (http://itunes.apple.com/us/app/spectra/id432494549?mt=8). Spectra is endowed with automatic language identification capabilities and can currently translate from/to English and six other languages (Chinese, French, German, Italian, Japanese and Spanish).}
Real-time Incremental Speech-to-Speech Translation of Dialogs
Srinivas Bangalore, Vivek Rangarajan Sridhar, Prakash Kolan, Ladan Golipour, Aura Jimenez
Proceedings of NAACL-HLT,
NAACL-HLT 2012,
2012.
[PDF]
[BIB]
Harvesting parallel text in multiple languages with limited supervision
Luciano Barbosa, Vivek Rangarajan sridhar, Mahsa Yarmohammadi, Srinivas Bangalore
Proceedings of COLING,
COLING,
2012.
[PDF]
[BIB]
COLING Copyright
The definitive version was published in 2012. , 2012-12-08
The Web is an ever increasing, dynamically changing, multilingual repository of text. There have been several approaches to harvest this repository for bootstrapping, supplementing and adapting data needed for training models in speech and language applications. In this paper, we present semi-supervised and unsupervised approaches to harvesting multilingual text that rely on a key observation of link collocation. We demonstrate the effectiveness of our approach in the context of statistical machine translation by harvesting parallel texts and training translation models in 20 different languages. Furthermore, by exploiting the DOM trees of parallel webpages, we extend our harvesting technique to create parallel data for resource limited languages in an unsupervised manner. We also present some interesting observations concerning the socio-economic factors that the multilingual Web reflects.

A Dataset Search Engine for the Research Document Corpus
Graham Cormode, Divesh Srivastava, Srinivas Bangalore, Marios Hadjieleftheriou
ICDE 2012,
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in ICDE 2012. , 2012-04-01
{A key step in validating a proposed idea or system
is to evaluate over a suitable data set. However, to this date there
have been no useful tools for researchers to understand which
datasets have been used for what purpose, or in what prior
work. Instead, they have to manually browse through papers
to find suitable datasets and their URLs, which is laborious and
inefficient. To better aid the data discovery process, and provide a
better understanding of how and where datasets have been used,
we propose a framework to effectively identify datasets within the
scientific corpus. The key technical challenges are identification
of datasets, and discovery of the association between a dataset
and the URLs where they can be accessed. Based on this, we
have built a user friendly web-based search interface for users
to conveniently explore the dataset-paper relationships, and find
relevant datasets and their properties.}

Predicting Relative Prominence in Noun-Noun Compounds
Taniya Mishra, Srinivas Bangalore
Proceedings of ACL-HLT 2011,
2011.
[PDF]
[BIB]
ISCA Copyright
The definitive version was published in Proceedings of ACL-HLT 2011. , 2011-06-19
{ There are several theories regarding what influences prominence assignment in English noun-noun compounds. We developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment. }
NON-LINEAR TAGGING MODELS WITH LOCALIST AND DISTRIBUTED WORD REPRESENTATIONS
Sumit Chopra, Srinivas Bangalore
The 36th International Conference on Acoustics, Speech and Signal Processing ,
2011.
[BIB]
{Distributed representations of words are attractive since they provide a means for measuring word similarity. However, most approaches to learning distributed representations are divorced from the task context. In this paper, we describe a model that learns distributed representations of words in order to optimize task performance. We investigate this model for part-of-speech tagging and supertagging tasks and demonstrate its superior accuracy over localist models, especially for rare words. We also show that adding non-linearity in the model aids in improved accuracy for complex tasks such as supertagging. }
Focusing on Novelty: A Crawling Strategy to Build Diverse Language Models
Luciano Barbosa, Srinivas Bangalore
20th ACM International Conference on Information and Knowledge Management,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in 20th ACM International Conference on Information and Knowledge Management , 2011-10-30.
Enriching text-to-speech synthesis using automatic dialog act tags
Vivek Rangarajan Sridhar, Ann Syrdal, Alistair Conkie, Srinivas Bangalore
Proceedings of Interspeech,
Interspeech,
2011.
[BIB]
We present an approach for enriching dialog based text-to-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare the voice quality produced by exploiting automatic dialog act tags with that using human annotations of dialog acts, and with two forms of reference databases. Even though the inventory of tags is different for the automatic tagger and human annotation, exploiting either form of dialog markup generates better voice quality in comparison with the reference voices in subjective evaluation.
Crawling Back and Forth: Using Back and Out Links to Locate Bilingual Sites
Luciano Barbosa, Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar
Proceedings of IJCNLP,
IJCNLP,
2011.
[PDF]
[BIB]
AFNLP Copyright
The definitive version was published in IJCNLP. , 2011-11-15
The definitive version was published in Very Large Databases, 2011. , 2011-11-15
Recently, there has been an increase interested for Web parallel
text for tasks such as machine translation and cross-language information
retrieval. Although previous
works have addressed many aspects of it, including
document pair selection, and sentence and word alignment, the
problem of discovering bilingual data sources in a large
scale has been overlooked to a great extent.
In this paper, we propose a novel crawling strategy to locate
bilingual sites which aims to achieve a balance between the
two conflicting requirements of this problem: the need to perform
a broad search while at the same time avoiding the need to crawl
unproductive Web regions. Our solution does so by focusing on
the graph neighborhood of bilingual sites and exploring
the patterns of the links in this region to guide its visitation policy.
To detect such sites, we introduce a two-step strategy that, first, relies on common patterns
found in the internal links of these sites to compose a classifier
that identifies candidate pages as entry points to parallel data in these sites,
and then, verifies whether these pages are in fact in the languages
of interest. Our experimental evaluation show that our crawler outperforms previous
crawling approaches for this task and produces a
high-quality collection of bilingual sites.

A Scalable Approach to Building a Parallel Corpus from the Web
Vivek Kumar Rangarajan Sridhar, Luciano Barbosa, Srinivas Bangalore
Proceedings of Interspeech,
INTERSPEECH,
2011.
[PDF]
[BIB]
ACL Copyright
The definitive version was published in EMNLP. , 2011-08-27
Parallel text acquisition from the Web is an attractive way for
augmenting statistical models (e.g., machine translation, cross-
lingual document retrieval) with domain representative data.
The basis for obtaining such data is a collection of pairs of bilin-
gual Web sites or pages. In this work, we propose a crawling
strategy that locates bilingualWeb sites by constraining the vis-
itation policy of the crawler to the graph neighborhood of bilin-
gual sites on the Web. Subsequently, we use a novel recursive
mining technique that recursively extracts text and links from
the collection of bilingual Web sites obtained from the crawl-
ing. Our method does not suffer from the computationally pro-
hibitive combinatorial matching typically used in previous work
that uses document retrieval techniques to match a collection of
bilingual webpages. We demonstrate the efficacy of our ap-
proach in the context of machine translation in the tourism and
hospitality domain. The parallel text obtained using our novel
crawling strategy results in a relative improvement of 21% in
BLEU score (English-to-Spanish) over an out-of-domain seed
translation model trained on the European parliamentary pro-
ceedings.

Finite-state models for Speech-based Search on Mobile Devices
Taniya Mishra, Srinivas Bangalore
Journal of Natural Language Engineering,
2010.
[PDF]
[BIB]
Cambridge University Press Copyright
The definitive version was published in Journal of Natural Language Engineering, 2010-11-01, http://journals.cambridge.org/action/displayMoreInfo?jid=NLE&type=tcr
{In this paper, we present techniques that exploit finite-state models for voice search applications. In particular, we illustrate the use of finite-state models for encoding the search index in order to tightly integrate the speech recognition and the search components of a voice search system. We show that the tight integration mutually benefits Automatic Speech Recognition and improves the search. In the second part of the paper, we discuss the use of finite-state techniques for spoken language understanding, in particular, to segment an input query into its component semantic fields so as to improve search as well as to extend the functionality of the system and be able to execute the user�s request against a backend database.
}
FEATURE-RICH CONTINUOUS LANGUAGE MODELS FOR SPEECH RECOGNITION
Sumit Chopra, Piotr Mirowski, Suhrid Balakrishnan, Srinivas Bangalore
IEEE Workshop on Spoken Language Technology,
2010.
[BIB]
{State-of-the-art probabilistic models of text such as n-grams require an exponential number of examples as the size of the context grows, a problem that is due to the discrete word representation. We propose to solve this problem by learning a continuous-valued and low-dimensional mapping of words, and base our predictions for the probabilities of the target word on non-linear dynamics of the latent space representation of the words in context window. We build on neural networks-based language models; by expressing them as energy-based models, we can further enrich the models with additional inputs such as part-of-speech tags, topic information and graphs of word similarity. We demonstrate a significantly lower perplexity on different text corpora, as well as improved word accuracy rate on speech recognition tasks, as compared to Kneser-Ney back-off n-gram-based language models.}

Comparing the Impact of Different Accounts of Dialog Structure on Coreference
Amanda Stent, Srinivas Bangalore
SLT 2010,
2010.
[PDF]
[BIB]
IEEE Copyright
The definitive version was published in Proceedings of ACL 2010. , 2010-12-12
{Determining the coreference of entity mentions in a
discourse is a key part of the interpretation process for
advanced spoken dialog applications. In this paper, we
present the most comprehensive system for statistical
coreference resolution in dialog to date. We also compare
the impact of two contrasting theories of dialog
structure (the stack model and the cache model) on the
performance of statistical coreference resolution, and
show that the stack model outperforms the cache model.
}
Method And Apparatus For Building Sales Tools By Mining Data From Websites,
Tue Jan 22 14:43:57 EST 2013
A website mining tool is disclosed that extracts information from, for example, a company's website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information.
Systems And Methods For Extracting Meaning From Multimodal Inputs Using Finite-State Devices,
Tue Jan 15 14:43:51 EST 2013
Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
System And Method Of Spoken Language Understanding In Human Computer Dialogs,
Tue May 29 16:10:31 EDT 2012
A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify.
Method And Apparatus For Automatically Building Conversational Systems,
Tue May 08 16:10:21 EDT 2012
A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then submitted to a database search and a response is generated. Information contained on the responsive webpage is extracted and converted to speech via a text-to-speech engine and communicated to the person.
Text Edit Tracker That Categorizes Communications, Determines Distances Between Templates, Codes Templates In Color, And Uses A Morphing Score Based On Edits,
Tue May 01 16:10:15 EDT 2012
A method for monitoring edits to a template for responding to an incoming communication includes categorizing the incoming communication into a category associated with the template for a response to the incoming communication. The method also includes determining distances between the template and each of a set of responses based on the template, at a predetermined level of granularity. The method also includes coding the template in accordance with the determined distances and displaying the coded template. A method for extracting a new template based on responses to an existing template includes selecting factors that affect quantitative measures for preparing a response to the incoming communication. The method includes using a mathematical model of the factors to cluster a set of responses created based on the existing template into two clusters. The method further includes restricting a first cluster centroid to be the existing template and searching for a second cluster centroid for a second cluster.
Method And Apparatus For Detecting And Extracting Information From Dynamically Generated Web Pages,
Tue Apr 17 16:10:07 EDT 2012
A method and apparatus for automatically detecting and extracting information from dynamically generated web pages are disclosed. For example, the present method stores user provided information that is entered into a form interface of a web page for a first query. Responsive to the first query, a first response web page is received and stored. The present method then automatically generates a second query to acquire a second response web page that is responsive to the second query. Finally, the present method compares the first response web page and the second response web page. In one embodiment, the present invention extracts information that is dissimilar between the first response web page and the second response web page. This extracted information is deemed to be the pertinent information requested by the user.
Systems And Methods For Extracting Meaning From Multimodal Inputs Using Finite-State Devices,
Tue Jan 24 16:09:06 EST 2012
Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
System And Method Of Generating Responses To Text-Based Messages,
Tue Dec 20 16:06:48 EST 2011
In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a sentence in the text-based natural language message. Also, identifying an input clause in the sentence. Further, comparing the input clause to a previously received clause, where the previously received clause is correlated with a previously generated response message. Additionally, generating an output response message based on the previously generated response message. The system includes means for performing the method steps.
System And Method Of Exploiting Prosodic Features For Dialog Act Tagging In A Discriminative Modeling Framework,
Tue Aug 09 16:05:53 EDT 2011
Disclosed are a system and method for exploiting information in an utterance for dialog act tagging. An exemplary method includes receiving a user utterance, computing at periodic intervals at least one parameter in the user utterance, quantizing the at least one parameter at each periodic interval, approximating conditional probabilities using an n-gram over a sliding window over the periodic intervals and tagging the utterance as a dialog act based on the approximated conditional probabilities.
Automatic Clustering Of Tokens From A Corpus For Grammar Acquisition,
Tue Jun 21 16:05:32 EDT 2011
A system for recognizing patterns is disclosed. Grammar learning from a corpus includes, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation or a cluster tree among the non-context tokens. The cluster tree is used for pattern recognition.
Automatic Learning For Mapping Spoken/Text Descriptions Of Products Onto Available Products,
Tue May 03 16:05:03 EDT 2011
A method, processing device, and machine-readable medium are provided. Costs of states of a state space are calculated. Each state represent one or more available product attributes having zero or more decided attribute values. The calculating is based, at least in part, on training data associated with previously requested and offered products. Determining a next state such that one or more products are available and a sum of values, including a cost of a next state and a cost of a perturbation of one of the one or more requested product attribute values to reach the next state is a minimum value. A value for a product attribute is mapped according to the minimum sum of values and product attribute values of available products.
Context-Sensitive Interface Widgets For Multi-Modal Dialog Systems,
Tue Feb 15 16:04:29 EST 2011
A system and method of presenting widgets to a user during a multi-modal interactive dialog between a user and a computer is presented. The system controls the multi-modal dialog; and when user input would help to clarify or speed up the presentation of requested information, the system presents a temporary widget to the user to elicit the user input in this regard. The system presents the widget on a display screen at a position that will not interfere with the dialog. Various types of widgets are available, such as button widgets, sliders and confirmation widgets, depending on the type of information that the system requires.
Method And Apparatus For Detecting And Extracting Information From Dynamically Generated Web Pages,
Tue Jan 25 16:04:24 EST 2011
A method and apparatus for automatically detecting and extracting information from dynamically generated web pages are disclosed. For example, the present method stores user provided information that is entered into a form interface of a web page for a first query. Responsive to the first query, a first response web page is received and stored. The present method then automatically generates a second query to acquire a second response web page that is responsive to the second query. Finally, the present method compares the first response web page and the second response web page. In one embodiment, the present invention extracts information that is dissimilar between the first response web page and the second response web page. This extracted information is deemed to be the pertinent information requested by the user.
On-Demand Language Translation For Television Programs,
Tue Oct 05 15:04:54 EDT 2010
In an embodiment, a method of providing an on demand translation service is provided. A subscriber may be charged a reduced fee or no fee for use of the on demand translation service in exchange for displaying commercial messages to the subscriber, the commercial messages being selected based on subscriber information. A multimedia signal including information in a source language may be received. The information may be obtained as text in the source language from the multimedia signal. The text may be translated from the source language to a target language. Translated information, based on the translated text, may be transmitted to a processing device for presentation to the subscriber. The received multimedia signal may be sent to a multimedia device for viewing.
Systems And Methods For Classifying And Representing Gestural Inputs,
Tue Aug 24 15:04:31 EDT 2010
Gesture and handwriting recognition agents provide possible interpretations of electronic ink. Recognition is performed on both individual strokes and combinations of strokes in the input ink lattice. The interpretations of electronic ink are classified and encoded as symbol complexes where symbols convey specific attributes of the contents of the stroke. The use of symbol complexes to represent strokes in the input ink lattice facilitates reference to sets of entities of a specific type.
Sequence Classification For Machine Translation,
Tue Aug 24 15:04:30 EDT 2010
Classification of sequences, such as the translation of natural language sentences, is carried out using an independence assumption. The independence assumption is an assumption that the probability of a correct translation of a source sentence word into a particular target sentence word is independent of the translation of other words in the sentence. Although this assumption is not a correct one, a high level of word translation accuracy is nonetheless achieved. In particular, discriminative training is used to develop models for each target vocabulary word based on a set of features of the corresponding source word in training sentences, with at least one of those features relating to the context of the source word. Each model comprises a weight vector for the corresponding target vocabulary word. The weights comprising the vectors are associated with respective ones of the features; each weight is a measure of the extent to which the presence of that feature for the source word makes it more probable that the target word in question is the correct one.
System And Method For Compiling Rules Created By Machine Learning Program,
Tue Aug 17 15:04:27 EDT 2010
A system, a method, and a machine-readable medium are provided. A group of linear rules and associated weights are provided as a result of machine learning. Each one of the group of linear rules is partitioned into a respective one of a group of types of rules. A respective transducer for each of the linear rules is compiled. A combined finite state transducer is created from a union of the respective transducers compiled from the linear rules.
Learning Edit Machined For Robust Multimodal Understanding,
Tue May 11 15:03:52 EDT 2010
A system and method are disclosed for processing received data associated with a grammar. The method comprises receiving input data having a characteristic that the input data cannot be assigned an interpretation by a grammar, translating the input data into translated input data and submitting the translated input data into the grammar. The transducer coerces the set of strings encoded in a lattice resulting from recognition (such as speech recognition) to the closest strings in the grammar that can be assigned an interpretation.
On-Demand Language Translation For Television Programs,
Tue May 04 15:03:47 EDT 2010
A method, a system and a machine-readable medium are provided for an on demand translation service. A translation module including at least one language pair module for translating a source language to a target language may be made available for use by a subscriber. The subscriber may be charged a fee for use of the requested on demand translation service or may be provided use of the on demand translation service for free in exchange for displaying commercial messages to the subscriber. A video signal may be received including information in the source language, which may be obtained as text from the video signal and may be translated from the source language to the target language by use of the translation module. Translated information, based on the translated text, may be added into the received video signal. The video signal including the translated information in the target language may be sent to a display device.
Systems And Methods For Generating Markup-Language-Based Expressions From Multi-Modal And
Unimodal Inputs,
Tue Feb 09 15:03:31 EST 2010
When using finite-state devices to perform various functions, it is beneficial to use finite state devices representing regular grammars with terminals having markup-language-based semantics. By using markup-language-based symbols in the finite state devices, it is possible to generate valid markup-language expressions by concatenating the symbols representing the result of the performed function. The markup-language expression can be used by other applications and/or devices. Finite-state devices are used to convert strings of words and gestures into valid markup-language, for example, XML, expressions that can be used, for example, to provide an application program interface to underlying system applications.
Method And Apparatus For Automatically Building Conversational Systems,
Tue Feb 09 15:03:30 EST 2010
A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then submitted to a database search and a response is generated. Information contained on the responsive webpage is extracted and converted to speech via a text-to-speech engine and communicated to the person.
System And Method For Natural Language Generation,
Tue Jul 14 16:07:36 EDT 2009
A system, method and computer-readable medium for generating natural language utilizes a stochastic process to choose a derivation tree according to a predetermined grammar, such as tree-adjoined grammar (TAG). A word lattice is created from a single semi-specified derivation tree and the proper path (i.e., desired output string) is selected from the lattice using a least cost, or other appropriate algorithms.
Systems And Methods For Classifying And Representing Gestural Inputs,
Tue Mar 17 16:07:22 EDT 2009
Gesture and handwriting recognition agents provide possible interpretations of electronic ink. Recognition is performed on both individual strokes and combinations of strokes in the input ink lattice. The interpretations of electronic ink are classified and encoded as symbol complexes where symbols convey specific attributes of the contents of the stroke. The use of symbol complexes to represent strokes in the input ink lattice facilitates reference to sets of entities of a specific type.
System and method for compiling rules created by machine learning program,
Tue Nov 11 18:13:13 EST 2008
A system, a method, and a machine-readable medium are provided. A group of linear rules and associated weights are provided as a result of machine learning. Each one of the group of linear rules is partitioned into a respective one of a group of types of rules. A respective transducer for each of the linear rules is compiled. A combined finite state transducer is created from a union of the respective transducers compiled from the linear rules.
System and method of providing a spoken dialog interface to a website,
Tue May 13 18:12:49 EDT 2008
Disclosed is a system and method for generating a spoken dialog service from website data. Spoken dialog components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a test-to-speech module. These components are capable of being automatically trained from processed website data. A website analyzer converts a website into structured text data set and a structured task knowledge base. The website analyzer further extracts linguistic items from the website data. The dialog components are automatically trained from the structured text data set, structured task knowledge base and linguistic items.
Automatic clustering of tokens from a corpus for grammar acquisition,
Tue Apr 08 18:12:43 EDT 2008
A method of grammar learning from a corpus comprises, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation among the non-context tokens.
Systems and methods for extracting meaning from multimodal inputs using finite-state devices,
Tue Nov 13 18:12:25 EST 2007
Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
Systems and methods for generating markup-language based expressions from multi-modal and unimodal inputs,
Tue Aug 14 18:12:10 EDT 2007
When using finite-state devices to perform various functions, it is beneficial to use finite state devices representing regular grammars with terminals having markup-language-based semantics. By using markup-language-based symbols in the finite state devices, it is possible to generate valid markup-language expressions by concatenating the symbols representing the result of the performed function. The markup-language expression can be used by other applications and/or devices. Finite-state devices are used to convert strings of words and gestures into valid markup-language, for example, XML, expressions that can be used, for example, to provide an application program interface to underlying system applications.
System and method for natural language generation,
Tue Jun 12 18:12:06 EDT 2007
A system, method and computer-readable medium for generating natural language utilizes a stochastic process to choose a derivation tree according to a predetermined grammar, such as tree-adjoined grammar (TAG). A word lattice is created from a single semi-specified derivation tree and the proper path (i.e., desired output string) is selected from the lattice using a least cost, or other appropriate algorithms.
System and method for accessing and annotating electronic medical records using multi-modal interface,
Tue May 29 18:12:04 EDT 2007
A system and method of exchanging medical information between a user and a computer device is disclosed. The computer device can receive user input in one of a plurality of types of user input comprising speech, pen, gesture and a combination of speech, pen and gesture. The method comprises receiving information from the user associated with a medical condition and a bodily location of the medical condition on a patient in one of a plurality of types of user input, presenting in one of a plurality of types of system output an indication of the received medical condition and the bodily location of the medical condition, and presenting to the user an indication that the computer device is ready to receive further information. The invention enables a more flexible multi-modal interactive environment for entering medical information into a computer device. The medical device also generates multi modal output for presenting a patient's medical condition in an efficient manner.
Method and apparatus for providing stochastic finite-state machine translation,
Tue Sep 26 18:11:35 EDT 2006
A method and apparatus for stochastic finite-state machine translation is provided. The method may include receiving a speech input and translating the speech input in a source language into one or more symbols in a target language based on stochastic language model. Subsequently, all possible sequences of the translated symbols may be generated. One of the generated sequences may be selected based on a monolingual target language model.
Systems and methods for extracting meaning from multimodal inputs using finite-state devices,
Tue Jun 27 18:11:22 EDT 2006
Finite-state systems and methods allow multiple input streams to be parsed and integrated by a single finite-state device. These systems and methods not only address multimodal recognition, but are also able to encode semantics and syntax into a single finite-state device. The finite-state device provides models for recognizing multimodal inputs, such as speech and gesture, and composes the meaning content from the various input streams into a single semantic representation. Compared to conventional multimodal recognition systems, finite-state systems and methods allow for compensation among the various input streams. Finite-state systems and methods allow one input stream to dynamically alter a recognition model used for another input stream, and can reduce the computational complexity of multidimensional multimodal parsing. Finite-state devices provide a well-understood probabilistic framework for combining the probability distributions associated with the various input streams and for selecting among competing multimodal interpretations.
Probabilistic Model For Natural Language Generation,
Tue Sep 20 18:10:31 EDT 2005
A natural language generator utilizes a stochastic process to choose a derivation tree according to a predetermined reference grammar, such as a tree-adjoined grammar (TAG). A word lattice is created from a single semi-specified derivation tree and the proper path (i.e., desired output string) is selected from the lattice using a least cost, or other appropriate algorithm.
Systems and methods for extracting meaning from multimodal inputs using finite-state devices,
Tue Mar 15 18:10:19 EST 2005
Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
Automatic clustering of tokens from a corpus for grammar acquisition,
Tue Jun 15 18:09:50 EDT 2004
In a method of learning grammar from a corpus, context words are identified from a corpus. For the other non-context words, the method counts the occurrence of predetermined relationships which the context words, and maps the counted occurrences to a multidimensional frequency space. Clusters are grown from the frequency vectors. The clusters represent classes of words; words in the same cluster possess the same lexical significancy and provide an indicator of grammatical structure.
Method For Building Linguistic Models From A Corpus,
Tue Jul 02 18:08:17 EDT 2002
A method iteratively integrates clustering techniques with phrase acquisition techniques to build complex linguistic models from a corpus. A set of features is initialized by the corpus. Thereafter, the method determines, according to a predetermined cost function, to process the features by one of phrase clustering processing or phrase grammar learning processing. If phrase clustering processing is performed, the method processes an interstitial set of features comprising both the old features and newly established clusters by phrase grammar learning processing. The features obtained as an output of phrase grammar learning is re-indexed as a set of features for a subsequent iteration. The method may be repeated over several iterations to build a hierarchical linguistic model.
Automatic clustering of tokens from a corpus for grammar acquisition,
Tue Nov 13 18:07:15 EST 2001
In a method of learning grammar from a corpus, context words are identified from a corpus. For the other non-context words, the method counts the occurrence of predetermined relationships which the context words, and maps the counted occurrences to a multidimensional frequency space. Clusters are grown from the frequency vectors. The clusters represent classes of words; words in the same cluster possess the same lexical significancy and provide an indicator of grammatical structure.
AT&T Science & Technology Medal, 2009.
For technical leadership and innovative contributions in Spoken Language Technology and Services.