
180 Park Ave - Building 103
Florham Park, NJ
Assistive Technology,
At AT&T Labs - Research, we apply our speech, language and media technologies to give people with disabilities more independence, privacy and autonomy.
AT&T Natural VoicesTM Text-to-Speech,
Natural Voices is AT&T's state-of-the-art text-to-speech product, taking text and producing natural-sounding, synthesized speech in a variety of voices and languages.
Predicting Relative Prominence in Noun-Noun Compounds
Taniya Mishra, Srinivas Bangalore
Proceedings of ACL-HLT 2011,
2011.
[PDF]
[BIB]
ISCA Copyright
The definitive version was published in Proceedings of ACL-HLT 2011. , 2011-06-19
{ There are several theories regarding what influences prominence assignment in English noun-noun compounds. We developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment. }
Predicting Human Perceived Accuracy of ASR Systems
Taniya Mishra, Andrej Ljolje, Mazin Gilbert
Interspeech,
2011.
[PDF]
[BIB]
ISCA Copyright
The definitive version was published in Interspeech. , 2011-08-28
{Word error rate (WER), which is the most commonly used method of measuring automatic speech recognition (ASR) accuracy,
penalizes all ASR errors (insertions, deletions, substitutions) equally.
However, humans differentially weigh different types of ASR errors. They judge ASR errors that distort the meaning of the spoken message more harshly than those that do not.
Following the central idea of differential weighting of different ASR errors, we developed a new metric, HPA (Human Perceived Accuracy) that aims to align more closely with human perception of ASR errors. Applied to the particular task of automatically recognizing voicemails, we found that the correlation between HPA and the human perception of ASR accuracy was significantly higher (r-value=0.91) than the correlation between WER and human judgement (r-value=0.65).}
On the Intelligibility of Fast Synthesized Speech for Individuals with Early-Onset Blindness
Amanda Stent, Ann Syrdal, Taniya Mishra
ACM ASSETS,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 20XX. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM ASSETS , 2011-10-15.
{People with visual disabilities increasingly use text-to-speech synthesis as a primary output modality for interaction with computers. Surprisingly, there have been no systematic comparisons of the performance of different text-to-speech systems for this user population.
In this paper we report the results of a pilot experiment on the intelligibility of fast synthesized speech for individuals with early-onset blindness. Using an open-response recall task, we collected data on four synthesis systems representing two major approaches to text-to-speech synthesis: formant-based synthesis and concatenative unit selection synthesis. We found a significant effect of speaking rate on intelligibility of synthesized speech, and a trend towards significance for synthesizer type. In post-hoc analyses, we found that participant-related factors, including age and familiarity with a synthesizer and voice, also affect intelligibility of fast synthesized speech.
}

Finite-state models for Speech-based Search on Mobile Devices
Taniya Mishra, Srinivas Bangalore
Journal of Natural Language Engineering,
2010.
[PDF]
[BIB]
Cambridge University Press Copyright
The definitive version was published in Journal of Natural Language Engineering, 2010-11-01, http://journals.cambridge.org/action/displayMoreInfo?jid=NLE&type=tcr
{In this paper, we present techniques that exploit finite-state models for voice search applications. In particular, we illustrate the use of finite-state models for encoding the search index in order to tightly integrate the speech recognition and the search components of a voice search system. We show that the tight integration mutually benefits Automatic Speech Recognition and improves the search. In the second part of the paper, we discuss the use of finite-state techniques for spoken language understanding, in particular, to segment an input query into its component semantic fields so as to improve search as well as to extend the functionality of the system and be able to execute the user�s request against a backend database.
}