180 Park Ave - Building 103
Florham Park, NJ
Crowd-sourcing for difficult transcription of speech
Jason Williams, Dan Melamed, Tirso Alonso, Barbara Hollister, Jay Wilpon
IEEE Workshop on Automatic Speech Recognition and Understanding, Hawaii, USA,
IEEE Workshop on Automatic Speech Recognition and Understanding,
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE Workshop on Automatic Speech Recognition and Understanding. , 2011-12-11
Crowd-sourcing is a promising method for fast and cheap transcription of large volumes of speech data. However, this method cannot achieve the accuracy of expert transcribers on speech that is difficult to transcribe. Faced with such speech data, we developed three new methods of crowd-sourcing allow explicit trade-offs among precision, recall, and cost. The methods are: incremental crowd-sourcing, treating ASR as a transcriber, and using a regression model to predict transcription reliability. Even though the accuracy of individual crowd-workers is only 55% on our data, our best method achieves 90% accuracy on 93% of the utterances, using only 1.3 crowd-worker transcriptions per utterance on average. When forced to transcribe all utterances, our best method matches the accuracy of previous crowdsourcing methods using only one third as many transcriptions. We also study the effects of various task design factors on transcription latency and accuracy, some of which have not been studied before.
Automatic Detection, Summarization And Reporting Of Business Intelligence Highlights From Automated Dialog Systems,
December 27, 2011
A method and system for reporting data from a spoken dialog service is disclosed. The method comprises extracting data regarding user dialogs using a dialog logging module in the spoken dialog service, analyzing the data to identify trends and reporting the trends. The data may be presented in a visual form for easier consumption. The method may also relate to identifying data within the control or outside the control of a service provider that is used to adjust the spoken dialog service to maximize customer retention.
Reducing time for annotating speech data to develop a dialog application,
August 12, 2008
Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.
Method and apparatus to provide enhanced speech recognition in a communication network,
May 8, 2001
A method and apparatus for enhanced speech recognition in a communication network in which a first input port is coupled to a first processor and receives a telephony signal from the communication network. A second input port, coupled to a second processor, receives the same telephony signal at substantially the same time as the first input port. Based on the telephony signal, the second processor generates recognized speech information. A control line coupled between the first and second processors lets the second processor send a command to the first processor, and the first processor changes state, such as by re-routing the telephony signal, based on the command. The second processor may also enter one of a plurality of states such that the state of the second processor corresponds to the state of the first processor at a given point during a telephone call. Instead of receiving the telephony signal, the second processor may receive speech data generated by the first processor.