
Speech mashup = Web app + speech
Speech mashups provide an easy way for web developers to incorporate a speech interface into their web apps so their users can use voice commands and receive back spoken responses. All speech and language processing, from automatic speech recognition, text-to-speech conversion, and natural language processing, is performed on the AT&T network where servers run the AT&T WATSON (SM) ASR and the AT&T Natural Voices (TM) TTS - the same speech technology employed for enterprise customers of AT&T.
Speech mashups work as follows: audio or text from a mobile device or a web browser is relayed over the cell network to the speech mashup manager, which manages the entire process by accessing AT&T servers where the speech and language processing takes place, and then relaying the result (interpreted into programming language) to the web application. If the application result is to be spoken, the speech mashup manager sends it for TTS conversion before relaying the spoken response back to the user.
All processing steps are tightly integrated to minimize the number of round trips in the mobile network and reduce latency to achieve a better user experience.
Building a speech mashup for a mobile device (any network-enabled device with audio input) requires the following:
1. Registering at the speech mashup portal (http://service.research.att.com/smm/) for an account on AT&T servers, and creating a directory for the web app and related files (grammars, log files, etc.).
2. Creating and uploading grammars or using a built-in or shared grammar (ASR applications only).
3. Building a speech mashup client in any suitable programming language (Java, JavaScript, etc.). Three sample clients are available for downloading and modification.
A developer's guide with instructions and examples is available from the portal.
iPizza Speech Mashup
Project Members
Related Projects
AT&T Application Resource Optimizer (ARO) - For energy-efficient apps
CHI Scan (Computer Human Interaction Scan)
CoCITe – Coordinating Changes in Text
E4SS - ECharts for SIP Servlets
Scalable Ad Hoc Wireless Geocast
Graphviz System for Network Visualization
Information Visualization Research - Prototypes and Systems
Swift - Visualization of Communication Services at Scale
AT&T Natural VoicesTM Text-to-Speech
StratoSIP: SIP at a Very High Level
Content Augmenting Media (CAM)
Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)
MIRACLE and the Content Analysis Engine (CAE)
Social TV - View and Contribute to Public Opinions about Your Content Live
Enhanced Indexing and Representation with Vision-Based Biometrics
Visual Semantics for Intuitive Mid-Level Representations
eClips - Personalized Content Clip Retrieval and Delivery
iMIRACLE - Content Retrieval on Mobile Devices with Speech
AT&T WATSON (SM) Speech Technologies
Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization