
180 Park Ave - Building 103
Florham Park, NJ
AT&T WATSON (SM) Speech Technologies,
AT&T WATSON (SM) integrates several speech technologies, including speech recognition. Tools allow for tuning recognition, adapting language & acoustic models, and adding custom extensions.
Connecting Your World,
The need to be connected is greater than ever, and AT&T Researchers are creating new ways for people to connect with one another and with their environments, whether it's their home, office, or car.
Speech translation,
AT&T Research is developing a real-time speech-to-speech translation technology so the translation starts as soon as speech is detected.
EMOTION DETECTION IN EMAIL CUSTOMER CARE
Narendra Gupta, Mazin Gilbert, Giuseppe Di
Computational Intelligence, An international Journal,
2012.
[PDF]
[BIB]
Wiley-Blackwell Copyright
"The definitive version is available at onlinelibrary.wiley.com." , 2012-10-04, http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.2012.00454.x/full
{Prompt and knowledgeable responses to customers’ emails are critical in maximizing customer satisfaction.
Such emails, often contain complaints about unfair treatment due to negligence, incompetence, rigid protocols,
unfriendly systems, and unresponsive personnel. In this paper, we refer to these emails as emotional emails. They
provide valuable feedback to improve contact center processes and customer care, as well as, to enhance customer
retention. This paper describes a method for extracting salient features and identifying emotional emails in customer
care. Salient features reflect customer frustration, dissatisfaction with the business, and threats to either leave,
take legal action and/or report to authorities. Compared to a baseline system using word unigrams, our proposed
approach with salient features resulted in absolute F-measure improvement of greater then 20%.}

Predicting Human Perceived Accuracy of ASR Systems
Taniya Mishra, Andrej Ljolje, Mazin Gilbert
Interspeech,
2011.
[PDF]
[BIB]
ISCA Copyright
The definitive version was published in Interspeech. , 2011-08-28
{Word error rate (WER), which is the most commonly used method of measuring automatic speech recognition (ASR) accuracy,
penalizes all ASR errors (insertions, deletions, substitutions) equally.
However, humans differentially weigh different types of ASR errors. They judge ASR errors that distort the meaning of the spoken message more harshly than those that do not.
Following the central idea of differential weighting of different ASR errors, we developed a new metric, HPA (Human Perceived Accuracy) that aims to align more closely with human perception of ASR errors. Applied to the particular task of automatically recognizing voicemails, we found that the correlation between HPA and the human perception of ASR accuracy was significantly higher (r-value=0.91) than the correlation between WER and human judgement (r-value=0.65).}
System And Method For Optimizing Response Handling Time And Customer Satisfaction Scores,
Tue Jan 22 14:43:58 EST 2013
A system and method disclosed for using and updating a database of template responses for a live agent in response to user communications. The method includes computing an average string distance between each response from a live agent and a template, use to generate the response, modifying the computed average string distance based on a customer satisfaction score associated with each response and selecting a response that minimizes the computed average string distance and maximizes customer satisfaction. Upon receiving a further communication on a certain issue, the system presents a prototype response that has been added to the template database to the live agent for use in generating a response to the further communication that reduces handling time and increases customer satisfaction.
System And Method For Supplemental Speech Recognition By Identified Idle Resources,
Tue Jan 01 14:43:42 EST 2013
Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer. A scheduling algorithm can tailor a particular combination of speech recognition resources and release the supplemental speech recognizer based on increased demand.
System And Method Of Dynamically Modifying A Spoken Dialog System To Reduce Hardware Requirements,
Tue Dec 11 16:12:28 EST 2012
A system and method for providing a scalable spoken dialog system are disclosed. The method comprises receiving information which may be internal to the system or external to the system and dynamically modifying at least one module within a spoken dialog system according to the received information. The modules may be one or more of an automatic speech recognition, natural language understanding, dialog management and text-to-speech module or engine. Dynamically modifying the module may improve hardware performance or improve a specific caller's speech processing accuracy, for example. The modification of the modules or hardware may also be based on an application or a task, or based on a current portion of a dialog.
System And Method Of Generating Responses To Text-Based Messages,
Tue Oct 23 16:12:04 EDT 2012
In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a first selected input clause in a sentence in the text-based natural language message. Also, assigning a semantic tag to the first selected input clause and matching the semantic tag to a historical input tag. The historical input tag associated with a first previously generated response clause. Further; generating an output response message based on the historical response clause, the output response message derived from the historical input tag and a second previously generated response clause. The system includes means for performing the method steps.
Using Web Mining To Enrich Directory Service Databases And Soliciting Service Subscriptions,
Tue Aug 14 16:11:26 EDT 2012
A system and method are provided for augmenting information on business directory databases and communicating with businesses is disclosed. Using the enriched business directory database and Web mining technology, customized email message are sent inviting businesses to enter their enriched business information into the directory or even subscribe to other paid services provided by the directory service.
System And Method For An Enhanced Shopping Experience,
Tue Jul 24 16:11:13 EDT 2012
Disclosed herein are systems, methods, and computer readable-media for creating a virtual shopping area. The method includes receiving a query from a user and an automated input specific to the user from a computing device, generating a list of merchants based on the query and the automated input, generating a virtual shopping area from the list of merchants and based on one or more constraints, and displaying the virtual shopping area on the computing device. One optional step is presenting to the user an interface to purchase query-related items from merchants in the virtual shopping area. The method optionally includes receiving an indication of intent to purchase an item from the user, displaying an image of the item to the user, and dynamically updating the displayed image of the item as the user specifies item-specific details. The list of merchants can be restricted to merchants geographically close to the user.
System And Method For Optimizing Response Handling Time and Customer Satisfaction Scores,
Tue Jul 03 16:10:56 EDT 2012
A system and method disclosed for using and updating a database of template responses for a live agent in response to user communications. The method includes computing an average string distance between each response from a live agent and a template, use to generate the response, modifying the computed average string distance based on a customer satisfaction score associated with each response and selecting a response that minimizes the computed average string distance and maximizes customer satisfaction. Upon receiving a further communication on a certain issue, the system presents a prototype response that has been added to the template database to the live agent for use in generating a response to the further communication that reduces handling time and increases customer satisfaction.
System And Method For Training A Critical E-mail Classifier Using A Plurality Of Base Classifiers And N-Grams,
Tue Jun 05 16:10:37 EDT 2012
Disclosed is a method and system for identifying critical emails. To identify critical emails, a critical email classifier is trained from training data comprising labeled emails. The classifier extracts N-grams from the training data and identifies N-gram features from the extracted N-grams. The classifier also extracts salient features from the training data. The classifier is trained based on the identified N-gram features and the salient features so that the classifier can classify unlabeled emails as critical emails or non-critical emails.
System And Method Of Spoken Language Understanding In Human Computer Dialogs,
Tue May 29 16:10:31 EDT 2012
A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify.
System And Method Of Providing An Automated Data-Collection In Spoken Dialog Systems,
Tue May 22 16:10:28 EDT 2012
The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system.
Method And Apparatus For Predicting Word Accuracy In Automatic Speech Recognition Systems,
Tue May 08 16:10:20 EDT 2012
The invention comprises a method and apparatus for predicting word accuracy. Specifically, the method comprises obtaining an utterance in speech data where the utterance comprises an actual word string, processing the utterance for generating an interpretation of the actual word string, processing the utterance to identify at least one utterance frame, and predicting a word accuracy associated with the interpretation according to at least one stationary signal-to-noise ratio and at least one non-stationary signal to noise ratio, wherein the at least one stationary signal-to-noise ratio and the at least one non-stationary signal to noise ratio are determined according to a frame energy associated with each of the at least one utterance frame.
Systems And Methods For Monitoring Speech Data Labelers,
Tue May 01 16:10:18 EDT 2012
Systems and methods herein use an annotation guide to label utterances and speech data with a call type. A system practicing the method embodiment monitors labelers of speech data by presenting via a processor a test utterance to a labeler, receiving input from the labeler that selects a particular call type from a list of call types and determining via the processor if the labeler labeled the test utterance correctly. Based on the determining step, the system revises the annotation guide, retrains the labeler, and/or alters the test utterance.
System And Method For Increasing Accuracy Of Searches Based On Communication Network,
Tue May 01 16:10:17 EDT 2012
Disclosed are systems, methods and computer-readable media for using a local communication network to generate a speech model. The method includes retrieving for an individual a list of numbers in a calling history, identifying a local neighborhood associated with each number in the calling history, truncating the local neighborhood associated with each number based on the at least one parameter, retrieving a local communication network associated with each number in the calling history and each phone number in the local neighborhood, and creating a language model for the individual based on the retrieved local communication network. The generated language model may be used for improved automatic speech recognition for audible searches as well as other modules in a spoken dialog system.
System And Method For Improving Robustness Of Speech Recognition Using Vocal Tract Length Normalization Codebooks,
Tue Apr 17 16:10:10 EDT 2012
Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
System And Method Of Automatically Generating Building Dialog Services By Exploiting The Content And Structure Of Websites,
Tue Jan 03 16:08:53 EST 2012
A method and system are disclosed for providing a dialog interface for a website. The method comprises at each node in a website, computing a summary, a document description and an alias. A dialog manager within a spoken dialog service utilizes the summary, document description and alias for each website node to generate prompts to a user, wherein nodes in the website are matched with user requests. In this manner, a spoken dialog interface to the website content and navigation may be generated automatically.
Automatic Detection, Summarization And Reporting Of Business Intelligence Highlights From Automated Dialog Systems,
Tue Dec 27 16:06:49 EST 2011
A method and system for reporting data from a spoken dialog service is disclosed. The method comprises extracting data regarding user dialogs using a dialog logging module in the spoken dialog service, analyzing the data to identify trends and reporting the trends. The data may be presented in a visual form for easier consumption. The method may also relate to identifying data within the control or outside the control of a service provider that is used to adjust the spoken dialog service to maximize customer retention.
System And Method Of Generating Responses To Text-Based Messages,
Tue Dec 20 16:06:48 EST 2011
In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a sentence in the text-based natural language message. Also, identifying an input clause in the sentence. Further, comparing the input clause to a previously received clause, where the previously received clause is correlated with a previously generated response message. Additionally, generating an output response message based on the previously generated response message. The system includes means for performing the method steps.
Transparent Voice Registration And Verification Method And System,
Tue Dec 13 16:06:44 EST 2011
Transparent voice registration of a party is provided in order to provide voice verification for communications with a service center. Verbal communication spoken by a party during interaction between the party and an agent of the service center is captured. A voice model associated with the captured communication is created and stored in order to provide voice verification during a subsequent call to the service center. When a requester contacts the service center, a comparison of the voice of the requester and a voice model of the person that the requester claims to be is performed, in order to verify the identity of the requester. Additionally, a voice model associated with a party is automatically updated after a subsequent communication between the party and the service center.
System And Method Of Automatically Building Dialog Services By Exploiting The Content And Structure Of Websites,
Tue Nov 22 16:06:37 EST 2011
A method and system are disclosed for providing a dialog interface for a website. The method comprises at each node in a website, computing a summary, a document description and an alias. A dialog manager within a spoken dialog service utilizes the summary, document description and alias for each website node to generate prompts to a user, wherein nodes in the website are matched with user requests. In this manner, a spoken dialog interface to the website content and navigation may be generated automatically.
Finding The Website Of A Business Using The Business Name,
Tue Nov 22 16:06:36 EST 2011
A system and method are provided for augmenting information on business directory databases. Using the business name contained in a business directory database and Web data mining technology, the website of a business is found and validated, prior to enriching the database entries.
Automated Call Router For Business Directory Using The World Wide Web,
Tue Jul 26 16:05:47 EDT 2011
The embodiments include a system, a computer readable medium, and a method for establishing a communication connection after searching the World Wide Web for relevant phone information. The system can include a first communication device for forming at least one communication connection between the first communication device and a second communication device, search means adapted to accept a query, access means adapted to (i) search and identify relevant phone number information using the query (ii) create at least one icon to link the first communication device to a relevant phone number included in the relevant phone number information identified by the query, and (iii) reformulate the query if no relevant phone numbers are identified during the search. The system also includes click-to-dial means adapted to establish at least one communication connection from the first communication device to the second communication device.
System And Method For Increasing Accuracy Of Searches Based On Communities Of Interest,
Tue Jun 21 16:05:31 EDT 2011
Disclosed are systems, methods and computer-readable media for using a local communication network to generate a speech model. The method includes retrieving for an individual a list of numbers in a calling history, identifying a local neighborhood associated with each number in the calling history, truncating the local neighborhood associated with each number based on the at least one parameter, retrieving a local communication network associated with each number in the calling history and each phone number in the local neighborhood, and creating a language model for the individual based on the retrieved local communication network. The generated language model may be used for improved automatic speech recognition for audible searches as well as other modules in a spoken dialog system.
Active Labeling For Spoken Language Understanding,
Tue May 24 16:05:17 EDT 2011
A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.
Method And Apparatus For Building Sales Tools By Mining Data From Websites,
Tue May 24 16:05:13 EDT 2011
A website mining tool is disclosed that extracts information from, for example, a company's website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information.
Method For Building A Natural Language Understanding Model For A Spoken Dialog System,
Tue Apr 26 16:05:02 EDT 2011
A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received. After all the training data is received, at least once, the method comprises building a third NLU model using all the labeling data, wherein the third NLU model is used in generating the spoken dialog service.
System And Method For Providing A Natural Language Interface To A Database,
Tue Apr 05 16:04:48 EDT 2011
A system and method for providing a natural language interface to a database or the Internet. The method provides a response from a database to a natural language query. The method comprises receiving a user query, extracting key data from the user query, submitting the extracted key data to a data base search engine to retrieve a top n pages from the data base, processing of the top n pages through a natural language dialog engine and providing a response based on processing the top n pages.
Method And Apparatus For Detecting And Extracting Information From Dynamically Generated Web Pages,
Tue Jan 25 16:04:24 EST 2011
A method and apparatus for automatically detecting and extracting information from dynamically generated web pages are disclosed. For example, the present method stores user provided information that is entered into a form interface of a web page for a first query. Responsive to the first query, a first response web page is received and stored. The present method then automatically generates a second query to acquire a second response web page that is responsive to the second query. Finally, the present method compares the first response web page and the second response web page. In one embodiment, the present invention extracts information that is dissimilar between the first response web page and the second response web page. This extracted information is deemed to be the pertinent information requested by the user.
Voice-Enabled Dialog System,
Tue Jan 11 16:04:22 EST 2011
A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question.
On-Demand Language Translation For Television Programs,
Tue Oct 05 15:04:54 EDT 2010
In an embodiment, a method of providing an on demand translation service is provided. A subscriber may be charged a reduced fee or no fee for use of the on demand translation service in exchange for displaying commercial messages to the subscriber, the commercial messages being selected based on subscriber information. A multimedia signal including information in a source language may be received. The information may be obtained as text in the source language from the multimedia signal. The text may be translated from the source language to a target language. Translated information, based on the translated text, may be transmitted to a processing device for presentation to the subscriber. The received multimedia signal may be sent to a multimedia device for viewing.
System And Method For Improving Robustness Of Speech Recognition Using Vocal Tract Length Normalization Codebooks,
Tue Sep 14 15:04:44 EDT 2010
Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
System And Method Of Identifying Web Page Semantic Structures,
Tue Aug 24 15:04:32 EDT 2010
The disclosure presents a method, system and computer-readable medium related to automatically analyzing structure for a web page. The method embodiment comprises building a training corpus comprising a broad stylistic coverage of web pages, segmenting a web page into information blocks, identifying semantic categories of the information blocks using the training corpus and applying the identical semantic categories in a web-based tool.
Timing Of Speech Recognition Over Lossy Transmission Systems,
Tue Jul 06 15:04:12 EDT 2010
Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.
Method Of Generation A Labeling Guide For Spoken Dialog Services,
Tue Jun 01 15:03:58 EDT 2010
A method is disclosed for designing a labeling guide for use by a labeler in labeling data used for training a spoken language understanding (SLU) module for an application. The method comprises a labeling guide designer selecting domain-independent actions applicable to an application, selecting domain-dependent objects according to characteristics of the application, and generating a labeling guide using the selected domain-independent actions and selected domain-dependent objects. An advantage of the labeling guide generated in this manner is that the labeling guide designer can easily port the labeling guide to a new application by selecting a set of domain-independent action and then selecting the domain-dependent objects related to the new application.
Systems And Methods For Monitoring Speech Data Labelers,
Tue May 04 15:03:49 EDT 2010
Systems and methods for using an annotation guide to label utterances and speech data with a call type. A method embodiment monitors labelers of speech data by presenting via a processor a test utterance to a labeler, receiving input from the labeler that selects a particular call type from a list of call types and determining via the processor if the labeler labeled the test utterance correctly. Based on the determining step, the method performs at least one of the following: revising the annotation guide, retraining the labeler or altering the test utterance.
On-Demand Language Translation For Television Programs,
Tue May 04 15:03:47 EDT 2010
A method, a system and a machine-readable medium are provided for an on demand translation service. A translation module including at least one language pair module for translating a source language to a target language may be made available for use by a subscriber. The subscriber may be charged a fee for use of the requested on demand translation service or may be provided use of the on demand translation service for free in exchange for displaying commercial messages to the subscriber. A video signal may be received including information in the source language, which may be obtained as text from the video signal and may be translated from the source language to the target language by use of the translation module. Translated information, based on the translated text, may be added into the received video signal. The video signal including the translated information in the target language may be sent to a display device.
Method And Apparatus For Automatically Building Conversational Systems,
Tue Feb 09 15:03:30 EST 2010
A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then submitted to a database search and a response is generated. Information contained on the responsive webpage is extracted and converted to speech via a text-to-speech engine and communicated to the person.
Method For Building A Natural Language Understanding Model For A Spoken Dialog System,
Tue Nov 17 16:08:11 EST 2009
A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received. After all the training data is received, at least once, the method comprises building a third NLU model using all the labeling data, wherein the third NLU model is used in generating the spoken dialog service.
Active Learning Process For Spoken Dialog Systems,
Tue Jul 14 16:07:34 EDT 2009
A large amount of human labor is required to transcribe and annotate a training corpus that is needed to create and update models for automatic speech recognition (ASR) and spoken language understanding (SLU). Active learning enables a reduction in the amount of transcribed and annotated data required to train ASR and SLU models. In one aspect of the present invention, an active learning ASR process and active learning SLU process are coupled, thereby enabling further efficiencies to be gained relative to a process that maintains an isolation of data in both the ASR and SLU domains.
Active Labeling For Spoken Language Understanding,
Tue Jul 14 16:07:33 EDT 2009
An active labeling process is provided that aims to minimize the number of utterances to be checked again by automatically selecting the ones that are likely to be erroneous or inconsistent with the previously labeled examples. In one embodiment, the errors and inconsistencies are identified based on the confidences obtained from a previously trained classifier model. In a second embodiment, the errors and inconsistencies are identified based on an unsupervised learning process. In both embodiments, the active labeling process is not dependent upon the particular classifier model.
Speech Recognition Over Lossy Networks With Rejection Threshold,
Tue Feb 24 16:07:18 EST 2009
Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.
System and method of spoken language understanding in a spoken dialog service,
Tue Nov 11 18:13:12 EST 2008
A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question.
Reducing time for annotating speech data to develop a dialog application,
Tue Aug 12 18:12:58 EDT 2008
Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.
System and method of providing a spoken dialog interface to a website,
Tue May 13 18:12:49 EDT 2008
Disclosed is a system and method for generating a spoken dialog service from website data. Spoken dialog components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a test-to-speech module. These components are capable of being automatically trained from processed website data. A website analyzer converts a website into structured text data set and a structured task knowledge base. The website analyzer further extracts linguistic items from the website data. The dialog components are automatically trained from the structured text data set, structured task knowledge base and linguistic items.
Method of generating a labeling guide for spoken dialog services,
Tue Apr 29 18:12:46 EDT 2008
A method is disclosed for designing a labeling guide for use by a labeler in labeling data used for training a spoken language understanding (SLU) module for an application. The method comprises a labeling guide designer selecting domain-independent actions applicable to an application, selecting domain-dependent objects according to characteristics of the application, and generating a labeling guide using the selected domain-independent actions and selected domain-dependent objects. An advantage of the labeling guide generated in this manner is that the labeling guide designer can easily port the labeling guide to a new application by selecting a set of domain-independent action and then selecting the domain-dependent objects related to the new application.
Spoken language understanding that incorporates prior knowledge into boosting,
Tue Feb 05 17:08:37 EST 2008
A system for understanding entries, such as speech, develops a classifier by employing prior knowledge with which a given corpus of training entries is enlarged threefold. A rule is created for each of the labels employed in the classifier, and the created rules are applied to the given corpus to create a corpus of attachments by appending a weight of .eta.p(x), or 1-.eta.p(x), to labels of entries that meet, or fail to meet, respectively, conditions of the labels' rules, and to also create a corpus of non-attachments by appending a weight of 1-.eta.p(x), or .eta.p(x), to labels of entries that meet, or fail to meet conditions of the labels' rules.
Method for building a natural language understanding model for a spoken dialog system,
Tue Nov 13 18:12:25 EST 2007
A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received. After all the training data is received, at least once, the method comprises building a third NLU model using all the labeling data, wherein the third NLU model is used in generating the spoken dialog service.
Active labeling for spoken language understanding,
Tue Nov 06 18:12:22 EST 2007
An active labeling process is provided that aims to minimize the number of utterances to be checked again by automatically selecting the ones that are likely to be erroneous or inconsistent with the previously labeled examples. In one embodiment, the errors and inconsistencies are identified based on the confidences obtained from a previously trained classifier model. In a second embodiment, the errors and inconsistencies are identified based on an unsupervised learning process. In both embodiments, the active labeling process is not dependent upon the particular classifier model.
Active learning process for spoken dialog systems,
Tue Nov 06 18:12:22 EST 2007
A large amount of human labor is required to transcribe and annotate a training corpus that is needed to create and update models for automatic speech recognition (ASR) and spoken language understanding (SLU). Active learning enables a reduction in the amount of transcribed and annotated data required to train ASR and SLU models. In one aspect of the present invention, an active learning ASR process and active learning SLU process are coupled, thereby enabling further efficiencies to be gained relative to a process that maintains an isolation of data in both the ASR and SLU domains.
Systems and methods for generating an annotation guide,
Tue May 15 18:12:01 EDT 2007
Systems and methods for generating an annotation guide. Speech data is organized and presented to a user. After the user selects some of the utterances in the speech data, the selected utterances are included in a class and/or call type. Additional utterances that belong to the class and/or call type can be found in the speech data using relevance feedback, data mining, data clustering, support vector machines, and the like. After a call type is complete, it is committed to the annotation guide. After all call types are completed, the annotation guide is generated.
System for handling frequently asked questions in a natural language dialog service,
Tue Mar 27 18:11:58 EDT 2007
A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question.
Recognizing the numeric language in natural spoken dialogue,
Tue Feb 20 18:11:55 EST 2007
A system for recognizing connected digits in natural spoken dialogue includes a speech recognition processor that receives unconstrained fluent input speech and produces a string of words that can include a numeric language, and a numeric understanding processor that converts the string of words into a sequence of digits based on a set of rules. An acoustic model database utilized by the speech recognition processor includes a first set of hidden Markov models that characterize the acoustic features of numeric words and phrases, a second set of hidden Markov models that characterize the acoustic features of the remaining vocabulary words, and a filler model that characterizes the acoustic features of out-of-vocabulary utterances. An utterance verification processor verifies the accuracy of the string of words. A validation database stores a grammar, and a string validation processor outputs validity information based on a comparison of the sequence of digits with the grammar. A dialogue manager processor initiates an action based on the validity information.
Speech recognition over lossy networks with rejection threshold,
Tue Jan 30 18:11:51 EST 2007
Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. Potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.
Spoken language understanding that incorporates prior knowledge into boosting,
Tue Dec 19 17:08:34 EST 2006
A system for understanding entries, such as speech, develops a classifier by employing prior knowledge with which a given corpus of training entries is enlarged threefold. A rule is created for each of the labels employed in the classifyier, and the created rules are applied to the given corpus to create a corpus of attachments by appending a weight of .eta.p(x), or 1-.eta.p(x), to labels of entries that meet, or fail to meet, respectively, conditions of the labels' rules, and to also create a corpus of non-attachments by appending a weight of 1-.eta.p(x), or .eta.p(x), to labels of entries that meet, or fail to meet conditions of the labels' rules.
Speech recognition over lossy transmission systems,
Tue Aug 10 18:09:59 EDT 2004
Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. Potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.
Method And Apparatus For Speech Recognition Using Second Order Statistics And Linear Estimation Of Cepstral Coefficient,
Tue Mar 13 18:07:00 EST 2001
A method and apparatus for speech recognition using second order statistics and linear estimation of cepstral coefficients. In one embodiment, a speech input signal is received and cepstral features are extracted. An answer is generated using the extracted cepstral features and a fixed signal independent diagonal matrix as the covariance matrix for the cepstral components of the speech input signal and, for example, a hidden Markov model. In another embodiment, a noisy speech input signal is received and a cepstral vector representing a clean speech input signal is generated based on the noisy speech input signal and an explicit linear minimum mean square error cepstral estimator.
Method and apparatus for discriminative utterance verification using multiple confidence measures,
Tue Sep 26 18:05:36 EDT 2000
A multiple confidence measures subsystem of an automated speech recognition system allows otherwise independent confidence measures to be integrated and used for both training and testing on a consistent basis. Speech to be recognized is input to a speech recognizer and a recognition verifier of the multiple confidence measures subsystem. The speech recognizer generates one or more confidence measures. The speech recognizer preferably generates a misclassification error (MCE) distance as one of the confidence measures. The recognized speech output by the speech recognizer is input to the recognition verifier, which outputs one or more confidence measures. The recognition verifier preferably outputs a misverification error (MVE) distance as one of the confidence measures. The confidence measures output by the speech recognizer and the recognition verifier are normalized and then input to an integrator. The integrator integrates the various confidence measures during both a training phase for the hidden Markov models implemented in the speech recognizer and the recognition verifier and during testing of the input speech. The integrator is preferably implemented using a multi-layer perceptron (MLP). The output of the integrator, rather than the recognition verifier, determines whether the recognized utterance hypothesis generated by the speech recognizer should be accepted or rejected.
System and method of recognizing an acoustic environment to adapt a set of based recognition models to the current acoustic environment for subsequent speech recognition,
Tue Sep 28 18:05:22 EDT 1999
A speech recognition system which effectively recognizes unknown speech from multiple acoustic environments includes a set of secondary models, each associated with one or more particular acoustic environments, integrated with a base set of recognition models. The speech recognition system is trained by making a set of secondary models in a first stage of training, and integrating the set of secondary models with a base set of recognition models in a second stage of training.
IEEE Fellow, 2011.
For contributions to speech recognition, speech synthesis, and spoken language understanding.
Science & Technology Medal, 2006.
Honored for outstanding technical contributions and leadership in advancing spoken language understanding technologies and services.