
180 Park Ave - Building 103
Florham Park, NJ
Assistive Technology,
At AT&T Labs - Research, we apply our speech, language and media technologies to give people with disabilities more independence, privacy and autonomy.
AT&T WATSON (SM) Speech Technologies,
AT&T WATSON (SM) integrates several speech technologies, including speech recognition. Tools allow for tuning recognition, adapting language & acoustic models, and adding custom extensions.
Speech Mashup,
The AT&T speech mashup is a web service that implements speech tasks for web applications, enabling users of smart phones and other devices to use and hear voice communications.
Crowd-sourcing for difficult transcription of speech
Jason Williams, Dan Melamed, Tirso Alonso, Barbara Hollister, Jay Wilpon
IEEE Workshop on Automatic Speech Recognition and Understanding, Hawaii, USA,
IEEE Workshop on Automatic Speech Recognition and Understanding,
2011.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE Workshop on Automatic Speech Recognition and Understanding. , 2011-12-11
Crowd-sourcing is a promising method for fast and cheap transcription of large volumes of speech data. However, this method cannot achieve the accuracy of expert transcribers on speech that is difficult to transcribe. Faced with such speech data, we developed three new methods of crowd-sourcing allow explicit trade-offs among precision, recall, and cost. The methods are: incremental crowd-sourcing, treating ASR as a transcriber, and using a regression model to predict transcription reliability. Even though the accuracy of individual crowd-workers is only 55% on our data, our best method achieves 90% accuracy on 93% of the utterances, using only 1.3 crowd-worker transcriptions per utterance on average. When forced to transcribe all utterances, our best method matches the accuracy of previous crowdsourcing methods using only one third as many transcriptions. We also study the effects of various task design factors on transcription latency and accuracy, some of which have not been studied before.
Method And Apparatus For Building Sales Tools By Mining Data From Websites,
Tue Jan 22 17:24:52 EST 2013
A website mining tool is disclosed that extracts information from, for example, a company's website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information.
Automated Task Classification System,
Tue Mar 27 16:09:41 EDT 2012
The invention concerns an automated task classification system that operates on a task objective of a user. The system may include a meaningful phrase generator that generates a plurality of meaningful phrases from a set of verbal and non-verbal speech. Each of the meaningful phrases may be generated based on one of a predetermined set of the task objectives. A recognizer may recognize at least one of the generated meaningful phrases in an input communication of the user and a task classifier may make a classification decision in response to the recognized meaningful phrases relating to one of the set of predetermined task objectives.
Method And Apparatus For Building Sales Tools By Mining Data From Websites,
Tue May 24 16:05:13 EDT 2011
A website mining tool is disclosed that extracts information from, for example, a company's website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information.
Automatic Task Classification System,
Tue Sep 15 16:08:01 EDT 2009
The invention concerns an automated task classification system that operates on a task objective of a user. The system may include a meaningful phrase generator that generates a plurality of meaningful phrases from a set of verbal and non-verbal speech. Each of the meaningful phrases may be generated based on one of a predetermined set of the task objectives. A recognizer may recognize at least one of the generated meaningful phrases in an input communication of the user and a task classifier may make a classification decision in response to the recognized meaningful phrases relating to one of the set of predetermined task objectives.
Method and system for performing speech recognition,
Tue Sep 08 18:05:05 EDT 1998
Speech recognition processing is compensated for improving robustness of speech recognition in the presence of enhanced speech signals. The compensation overcomes the adverse effects that speech signal enhancement may have on speech recognition performance, where speech signal enhancement causes acoustical mismatches between recognition models trained using unenhanced speech signals and feature data extracted from enhanced speech signals. Compensation is provided at the front end of an automatic speech recognition system by combining linear predictive coding and mel-based cepstral parameter analysis for computing cepstral features of transmitted speech signals used for speech recognition processing by selectively weighting mel-filter banks when processing frequency domain representations of the enhanced speech signals.
Automated call router system and method,
Tue Oct 07 18:05:03 EDT 1997
An automated call routing system and method which operates on a call-routing objective of a calling party expressed in natural speech of the calling party. The system incorporates a speech recognition function, as to which a calling party's natural-speech call routing objective provides an input, and which is trained to recognize a plurality of meaningful phrases, each such phrase being related to a specific call routing objective. Upon recognition of one or more of such meaningful phrases in a calling party's input speech, an interpretation function then acts on such calling party's routing objective request to either implement the calling party's requested routing objective or to enter into a dialog with the calling party to obtain additional information from which a sufficient confidence level can be attained to implement that routing objective.
Speech recognition employing key word modeling and non-key word modeling,
Tue Apr 16 18:05:02 EDT 1996
Speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one type for defined vocabulary words (e.g., collect, calling-card, person, third-number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words (e.g. `I want to make a collect call please`). For this type of key word spotting, modifications are made to a connected word speech recognition algorithm based on state-transitional (hidden Markov) models which allow it to recognize words from a pre-defined vocabulary list spoken in an unconstrained fashion. Statistical models of both the actual vocabulary words and the extraneous speech and background noises are created. A syntax-driven connected word recognition system is then used to find the best sequence of extraneous input and vocabulary word models for matching the actual input speech.
Automatic speech recognizer,
Tue Jul 12 18:05:00 EDT 1994
Apparatus and method for recording data in a speech recognition system and recognizing spoken data corresponding to the recorded data. The apparatus and method responds to entered data by generating a string of phonetic transcriptions from the entered data. The data and generated phonetic transcription string associated therewith is recorded in a vocabulary lexicon of the speech recognition system. The apparatus and method responds to receipt of spoken data by constructing a model of subwords characteristic of the spoken data and compares the constructed subword model with ones of the recorded lexicon vocabulary recorded phonetic transcription strings to recognize the spoken data as the data identified by and associated with a phonetic transcription string matching the constructed subword string.
Endpoint detector,
Tue Apr 11 18:04:59 EDT 1989
An arrangement for endpoint detection improves speech recognition accuracy where the input signal includes nonstationary noise. Energy pulses are found by looking for local energy level peaks, then analyzing surrounding energy levels to determine pulse boundaries. Energy pulses are combined according to predetermined criteria to form longer pulses corresponding to words or phrases in the input signal.
Method and apparatus for generating speech pattern templates,
Tue Jun 12 18:04:58 EDT 1984
A system for generating speech pattern templates for use with either speech recognition or speech synthesis. Reference demisyllable templates are first generated from a reference first speaker using both manual and automatic analysis. The analysis for a second speaker is simplified and automated by comparing with the first speaker's templates. The second speaker speaks the same words at a rate time-warped to match the first speakers rate and template. We define a demisyllable as each of the two halves of a syllable, assuming a syllable starts and ends with a noisy consonant, and the syllable is split at its vowel center, thereby simplifying concatenation and comparison. Key features of the invention include generating a set of signals representative of the time alignment between the first and second speaker's templates, and the time-of-occurence boundaries of each syllable in a word.
Word recognizer,
Tue Aug 23 18:04:58 EDT 1983
An input word is recognized as one of a set of reference words. A set of word distance signals representative of the correspondence of the input word to the reference words is generated. A set of weighted word distance signals is also generated. Responsive to the word distance signals and the weighted word distance signals, the reference word that most closely corresponds to the input word is selected.
Spoken word controlled automatic dialer,
Tue Sep 07 18:04:35 EDT 1982
A speech controlled dialing circuit identifies input utterances which may be a command word (mode select), repertory word (dialing name or number), or non-recognized (Other). Responsive to the identification of each occurring input utterance, a set of predetermined templates are selected to identify the next occuring utterance. A programmed microprocessor system is described to implement the main controller function.