Speech Mashup


Speech mashup = Web app + speech

Speech mashups provide an easy way for web developers to incorporate a speech interface into their web apps so their users can use voice commands and receive back spoken responses. All speech and language processing, from automatic speech recognition, text-to-speech conversion, and natural language processing, is performed on the AT&T network where servers run the AT&T WATSON (SM) ASR and the AT&T Natural Voices (TM) TTS - the same speech technology employed for enterprise customers of AT&T.

Speech mashups work as follows: audio or text from a mobile device or a web browser is relayed over the cell network to the speech mashup manager, which manages the entire process by accessing AT&T servers where the speech and language processing takes place, and then relaying the result (interpreted into programming language) to the web application. If the application result is to be spoken, the speech mashup manager sends it for TTS conversion before relaying the spoken response back to the user.

All processing steps are tightly integrated to minimize the number of round trips in the mobile network and reduce latency to achieve a better user experience.

Building a speech mashup for a mobile device (any network-enabled device with audio input) requires the following:

1. Registering at the speech mashup portal (http://service.research.att.com/smm/) for an account on AT&T servers, and creating a directory for the web app and related files (grammars, log files, etc.).

2. Creating and uploading grammars or using a built-in or shared grammar (ASR applications only).

3. Building a speech mashup client in any suitable programming language (Java, JavaScript, etc.). Three sample clients are available for downloading and modification.

A developer's guide with instructions and examples is available from the portal.