
180 Park Ave - Building 103
Florham Park, NJ
Speech translation,
AT&T Research is developing a real-time speech-to-speech translation technology so the translation starts as soon as speech is detected.
Corpus Analysis of Simultaneous Interpretation Data for Improving Real Time Speech Translation
Vivek Rangarajan sridhar, Srinivas Bangalore, John Chen
Interspeech 2013,
2013.
[PDF]
[BIB]
ISCA Copyright
The definitive version was published in 2013. , 2013-08-31
{Real-time speech-to-speech (S2S) translation of lectures and speeches require simultaneous translation with low latency to continually engage the listeners. However, simultaneous speech-to-speech translation systems have been predominantly repurposing translation models that are typically trained for
consecutive translation without a motivated attempt to model incrementality. Furthermore, the notion of translation is simplified to translation plus simultaneity. In contrast, human interpreters are able to perform simultaneous interpretation by generating target speech incrementally with very low ear-voice span by
using a variety of strategies such as compression (paraphrasing), incremental comprehension, and anticipation through discourse inference and expectation of discourse redundancies. Exploiting and modeling such phenomena can potentially improve automatic real-time translation of speech. As a first step, in this work
we identify and present a systematic analysis of phenomena used by human interpreters to perform simultaneous interpretation and elucidate how it can be exploited in a conventional simultaneous translation framework. We
perform our study on a corpus of simultaneous interpretation of Parliamentary speeches in English and Spanish. Specifically, we present an empirical analysis of factors such as time constraint, redundancy and inference as
evidenced in the simultaneous interpretation corpus.}

Word Prominence Detection using Robust yet Simple Prosodic Features
Taniya Mishra, Vivek Rangarajan sridhar, Alistair Conkie
Proceedings of Interspeech,
Interspeech 2012,
2012.
[PDF]
[BIB]
ISCA Copyright
The definitive version was published in 2012. , 2012-09-09
Automatic detection of word prominence can provide valuable information for downstream applications such as spoken language understanding. Prior work on automatic word prominence detection exploit a variety of lexical, syntactic, and prosodic features and model the task as a sequence of local classifications (independently or using history). While lexical and syntactic features are highly correlated with the notion of word prominence, the output of speech recognition is typically noisy and hence these features are less reliable than the acoustic-prosodic feature stream. In this work, we address the
automatic detection of word prominence through novel prosodic features that capture the changes in F0 curve shape and magnitude along with duration and energy. We contrast the utility of these features with aggregate statistics of F0, duration, and energy used in prior work. Our features
are simple to compute yet robust to the inherent difficulties associated with identifying salient points (such as F0 peaks, valleys, onsets, offsets, etc.) within the F0 contour. We demonstrate that these novel features are substantially more predictive than the standard aggregation-based prosodic features using feature analysis. Experimental results on a corpus of spontaneous speech indicate that the accuracy obtained using only the prosodic features is better than using both lexical and syntactic features.

SPECTRA: A SPEECH-TO-SPEECH TRANSLATION SYSTEM IN THE CLOUD
Vivek Rangarajan sridhar, Srinivas Bangalore, Aura Jimenez, Laden Golipour, Prakash Kolan
IEEE International Conference on Emerging Signal Processing Applications,
2012.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE International Conference on Emerging Signal Processing Applications. , 2012-01-12
{In this demonstration, we will present Spectra, a speech-to-speech (S2S) translation system in the cloud. Spectra comprises of an HMM-based large vocabulary continuous speech recognition (AT&T Watson Speech Recognizer), a phrase-based translation
system, and, a unit selection text-to-speech synthesis system (AT&T Natural Voices TTS). Spectra currently runs on any iOS device and can be downloaded as an application from the Apple application store (http://itunes.apple.com/us/app/spectra/id432494549?mt=8). Spectra is endowed with automatic language identification capabilities and can currently translate from/to English and six other languages (Chinese, French, German, Italian, Japanese and Spanish).}
Real-time Incremental Speech-to-Speech Translation of Dialogs
Srinivas Bangalore, Vivek Rangarajan Sridhar, Prakash Kolan, Ladan Golipour, Aura Jimenez
Proceedings of NAACL-HLT,
NAACL-HLT 2012,
2012.
[PDF]
[BIB]
Harvesting parallel text in multiple languages with limited supervision
Luciano Barbosa, Vivek Rangarajan sridhar, Mahsa Yarmohammadi, Srinivas Bangalore
Proceedings of COLING,
COLING,
2012.
[PDF]
[BIB]
COLING Copyright
The definitive version was published in 2012. , 2012-12-08
The Web is an ever increasing, dynamically changing, multilingual repository of text. There have been several approaches to harvest this repository for bootstrapping, supplementing and adapting data needed for training models in speech and language applications. In this paper, we present semi-supervised and unsupervised approaches to harvesting multilingual text that rely on a key observation of link collocation. We demonstrate the effectiveness of our approach in the context of statistical machine translation by harvesting parallel texts and training translation models in 20 different languages. Furthermore, by exploiting the DOM trees of parallel webpages, we extend our harvesting technique to create parallel data for resource limited languages in an unsupervised manner. We also present some interesting observations concerning the socio-economic factors that the multilingual Web reflects.

Enriching text-to-speech synthesis using automatic dialog act tags
Vivek Rangarajan Sridhar, Ann Syrdal, Alistair Conkie, Srinivas Bangalore
Proceedings of Interspeech,
Interspeech,
2011.
[BIB]
We present an approach for enriching dialog based text-to-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare the voice quality produced by exploiting automatic dialog act tags with that using human annotations of dialog acts, and with two forms of reference databases. Even though the inventory of tags is different for the automatic tagger and human annotation, exploiting either form of dialog markup generates better voice quality in comparison with the reference voices in subjective evaluation.
Crawling Back and Forth: Using Back and Out Links to Locate Bilingual Sites
Luciano Barbosa, Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar
Proceedings of IJCNLP,
IJCNLP,
2011.
[PDF]
[BIB]
AFNLP Copyright
The definitive version was published in IJCNLP. , 2011-11-15
The definitive version was published in Very Large Databases, 2011. , 2011-11-15
Recently, there has been an increase interested for Web parallel
text for tasks such as machine translation and cross-language information
retrieval. Although previous
works have addressed many aspects of it, including
document pair selection, and sentence and word alignment, the
problem of discovering bilingual data sources in a large
scale has been overlooked to a great extent.
In this paper, we propose a novel crawling strategy to locate
bilingual sites which aims to achieve a balance between the
two conflicting requirements of this problem: the need to perform
a broad search while at the same time avoiding the need to crawl
unproductive Web regions. Our solution does so by focusing on
the graph neighborhood of bilingual sites and exploring
the patterns of the links in this region to guide its visitation policy.
To detect such sites, we introduce a two-step strategy that, first, relies on common patterns
found in the internal links of these sites to compose a classifier
that identifies candidate pages as entry points to parallel data in these sites,
and then, verifies whether these pages are in fact in the languages
of interest. Our experimental evaluation show that our crawler outperforms previous
crawling approaches for this task and produces a
high-quality collection of bilingual sites.

A Scalable Approach to Building a Parallel Corpus from the Web
Vivek Kumar Rangarajan Sridhar, Luciano Barbosa, Srinivas Bangalore
Proceedings of Interspeech,
INTERSPEECH,
2011.
[PDF]
[BIB]
ACL Copyright
The definitive version was published in EMNLP. , 2011-08-27
Parallel text acquisition from the Web is an attractive way for
augmenting statistical models (e.g., machine translation, cross-
lingual document retrieval) with domain representative data.
The basis for obtaining such data is a collection of pairs of bilin-
gual Web sites or pages. In this work, we propose a crawling
strategy that locates bilingualWeb sites by constraining the vis-
itation policy of the crawler to the graph neighborhood of bilin-
gual sites on the Web. Subsequently, we use a novel recursive
mining technique that recursively extracts text and links from
the collection of bilingual Web sites obtained from the crawl-
ing. Our method does not suffer from the computationally pro-
hibitive combinatorial matching typically used in previous work
that uses document retrieval techniques to match a collection of
bilingual webpages. We demonstrate the efficacy of our ap-
proach in the context of machine translation in the tourism and
hospitality domain. The parallel text obtained using our novel
crawling strategy results in a relative improvement of 21% in
BLEU score (English-to-Spanish) over an out-of-domain seed
translation model trained on the European parliamentary pro-
ceedings.