
180 Park Ave - Building 103
Florham Park, NJ
An Alternative Frontend for the AT&T WATSON LV-CSR System
Dimitrios Dimitriadis, Enrico Bocchieri, Diamantino Caseiro
International Conference on Acoustics, Speech and Signal Processing,
2011.
[BIB]
{In previously published work, we have proposed a novel feature extraction algorithm approximating some of the human auditory characteristics and the robustness of an alternative energy estimation scheme. Herein, we examine the proposed feature performance under additive noise and suggest how to predict the noisy cepstral coefficient deviations by estimating the subband SNR values. Then, we examine the efficiency of the proposed features in the framework of a state-of-the-art LV-CSR system, namely the AT&T WATSON system. The features are examined in a mobile, voice search task, namely the Speak4It application. The proposed feature extraction scheme increases the overall performance by 6\% relative improvement, leaving the AM and LM training fixed. Additional improvements have been reported when this frontend is combined with advanced training techniques.}

Speech Recognition Modeling Advances For Mobile Voice Search
Enrico Bocchieri, Diamantino Caseiro, Dimitrios Dimitriadis
International Conference On Acoustics, Speech and Signal Processing,
2010.
[BIB]
{This paper reports on the development and advances in automatic speech recognition for the AT&T Speak4It voice-search application. With Speak4It as real-life example, we show the effectiveness of acoustic model (AM) and language model (LM) estimation (adaptation and training) on relatively small amounts of field-data. We then introduce algorithmic improvements concerning the use of sentence length in LM, of non-contextual features in AM decision-trees, and of the Teager energy in the acoustic front-end. The combination of these algorithms yields substantial accuracy improvements. LM and AM estimation on samples of field-data increases the word accuracy from 66.4% to 77.1%, a relative word error reduction of 32%. The algorithmic improvements increase the accuracy to 79.7%, an additional 11.3% relative error reduction.}
System And Method For Providing Large Vocabulary Speech Processing Based On Fixed-Point Arithmetic,
Tue Jun 05 12:52:19 EDT 2012
Disclosed herein is a system, method and computer-readable medium storing instructions for controlling a computing device according to the method. The invention relates to a system, method and computer-readable medium storing instructions for controlling a computing device according to the method. As an example embodiment, the method uses a speech recognition decoder that operates or uses fixed point arithmetic. The exemplary method comprises representing arc costs associated with at least one finite state transducer (FST) in fixed point, representing parameters associated with a hidden Markov model (HMM) in fixed point and processing speech data in the speech recognition decoder using fixed point arithmetic for the fixed point FST arc costs and the fixed point HMM parameters. The method may also include computing at the decoder sentence hypothesis probabilities with fixed point arithmetic as type Q-2e numbers.
Automatic speech recognizer,
Tue Jul 12 18:05:00 EDT 1994
Apparatus and method for recording data in a speech recognition system and recognizing spoken data corresponding to the recorded data. The apparatus and method responds to entered data by generating a string of phonetic transcriptions from the entered data. The data and generated phonetic transcription string associated therewith is recorded in a vocabulary lexicon of the speech recognition system. The apparatus and method responds to receipt of spoken data by constructing a model of subwords characteristic of the spoken data and compares the constructed subword model with ones of the recorded lexicon vocabulary recorded phonetic transcription strings to recognize the spoken data as the data identified by and associated with a phonetic transcription string matching the constructed subword string.
IEEE Fellow, 2013.
For contributions to computational models for speech recognition.
IEEE Signal Processing Society Best Paper Award, 2005.