
180 Park Ave - Building 103
Florham Park, NJ
Crowd-sourcing for difficult transcription of speech
Jason Williams, Dan Melamed, Tirso Alonso, Barbara Hollister, Jay Wilpon
IEEE Workshop on Automatic Speech Recognition and Understanding, Hawaii, USA,
IEEE Workshop on Automatic Speech Recognition and Understanding,
2011.
[PDF]
[BIB]
IEEE Copyright
This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in IEEE Workshop on Automatic Speech Recognition and Understanding. , 2011-12-11
Crowd-sourcing is a promising method for fast and cheap transcription of large volumes of speech data. However, this method cannot achieve the accuracy of expert transcribers on speech that is difficult to transcribe. Faced with such speech data, we developed three new methods of crowd-sourcing allow explicit trade-offs among precision, recall, and cost. The methods are: incremental crowd-sourcing, treating ASR as a transcriber, and using a regression model to predict transcription reliability. Even though the accuracy of individual crowd-workers is only 55% on our data, our best method achieves 90% accuracy on 93% of the utterances, using only 1.3 crowd-worker transcriptions per utterance on average. When forced to transcribe all utterances, our best method matches the accuracy of previous crowdsourcing methods using only one third as many transcriptions. We also study the effects of various task design factors on transcription latency and accuracy, some of which have not been studied before.

The Business Next Door: Click-Through Rate Modeling for Local Search
Suhrid Balakrishnan, Sumit Chopra, Dan Melamed
NIPS 2010 Workshop: Machine Learning in Online ADvertising,
2010.
[PDF]
[BIB]
MIT Press, Neural Information Processing Systems (NIPS) Copyright
The definitive version was published in NIPS 2010 Workshop: Machine Learning in Online ADvertising. , 2010-12-10, http://nips.cc/
{Computational advertising has received a tremendous amount of attention from the business and academic community recently. While great advances have been made on modeling click-through rate in well studied settings like sponsored search and context match, local search, has received relatively less attention. The geographic nature of local search and associated local browsing makes interesting research challenges and opportunities possible. We consider a novel application of a relational regression model to local search. The model is attractive in that it allows us to explicitly control and represent geographic and category-based neighborhood style constraints on the samples that result in superior click-through rate estimates. Further, the relational regression model we fit allows us to estimate an interpretable inherent `quality' of a business listing which we demonstrate reveals interesting latent information about listings and is also useful for further analysis.
}