
180 Park Ave - Building 103
Florham Park, NJ
Connecting Your World,
The need to be connected is greater than ever, and AT&T Researchers are creating new ways for people to connect with one another and with their environments, whether it's their home, office, or car.
Connecting Your World,
The need to be connected is greater than ever, and AT&T Researchers are creating new ways for people to connect with one another and with their environments, whether it's their home, office, or car.
Living rooms getting smarter with multimodal and multichannel signal processing
Dimitrios Dimitriadis, Horst Schroeter
IEEE SLTC newsletter,
2011.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE SLTC newsletter. , 2011-07-27
{}
Combining Frame and Segment Level Processing via Temporal Pooling for Phonetic Classification
Sumit Chopra, Patrick Haffner, Dimitrios Dimitriadis
12th Annual Conference of the International Speech Communication Association,
2011.
[PDF]
[BIB]
International Speech Communication Association Copyright
The definitive version was published in 12th Annual Conference of the International Speech Communication Association. , 2011-08-27
{We propose a simple, yet novel, multi-layer model for the problem of phonetic classification. Our model combines the frame level transformation of the acoustic signal with the segment level transformation via a temporal pooling architecture to compute class conditional probabilities of phones. Without the use of any phonetic knowledge, our model achieved the state-of-the-art performance on the TIMIT phone classification task. The flexibility of our model allows us to mix a variety of pooling architectures, leading
to further significant performance improvements.}
An Alternative Frontend for the AT&T WATSON LV-CSR System
Dimitrios Dimitriadis, Enrico Bocchieri, Diamantino Caseiro
International Conference on Acoustics, Speech and Signal Processing,
2011.
[BIB]
{In previously published work, we have proposed a novel feature extraction algorithm approximating some of the human auditory characteristics and the robustness of an alternative energy estimation scheme. Herein, we examine the proposed feature performance under additive noise and suggest how to predict the noisy cepstral coefficient deviations by estimating the subband SNR values. Then, we examine the efficiency of the proposed features in the framework of a state-of-the-art LV-CSR system, namely the AT&T WATSON system. The features are examined in a mobile, voice search task, namely the Speak4It application. The proposed feature extraction scheme increases the overall performance by 6\% relative improvement, leaving the AM and LM training fixed. Additional improvements have been reported when this frontend is combined with advanced training techniques.}

Speech Recognition Modeling Advances For Mobile Voice Search
Enrico Bocchieri, Diamantino Caseiro, Dimitrios Dimitriadis
International Conference On Acoustics, Speech and Signal Processing,
2010.
[BIB]
{This paper reports on the development and advances in automatic speech recognition for the AT&T Speak4It voice-search application. With Speak4It as real-life example, we show the effectiveness of acoustic model (AM) and language model (LM) estimation (adaptation and training) on relatively small amounts of field-data. We then introduce algorithmic improvements concerning the use of sentence length in LM, of non-contextual features in AM decision-trees, and of the Teager energy in the acoustic front-end. The combination of these algorithms yields substantial accuracy improvements. LM and AM estimation on samples of field-data increases the word accuracy from 66.4% to 77.1%, a relative word error reduction of 32%. The algorithmic improvements increase the accuracy to 79.7%, an additional 11.3% relative error reduction.}