AT&T Natural VoicesTM Text-to-Speech

Get the Flash Player to see this video.

Spectra - Speech-to-speech translation


More and more nowadays devices talk back to you. Where before it was common to hear phone dialog systems speaking to you (and understanding), an increasing number of personal devices--laptops, smart phones, GSP navigation systems, and game devices talk to you, too.

How do they do that? What makes devices able to talk?

Text-to-speech (TTS) is the technology that makes it possible. The application takes text and from it produces artificial, machine-made speech. TTS has been around for many years, though only in the past few years has synthesized speech reached a high level of naturalness. Better sounding speech combined with the explosive popularity of small mobile devices with even smaller screens has increased consumer demand for TTS, especially since it frees people to multitask and drive more safely while using their devices.

People with special needs also benefit from TTS. For people with low vision, TTS reads text from files, books, and websites, making information accessible. For people who can't speak, TTS gives them a voice to speak with. Stephen Hawking is a famous example (he prefers his own instantly recognizable version of TTS). Students learning a new language can improve pronunciation or listening skills with TTS.

Corporations also like TTS because the technology can be a way to provide information effectively over the telephone.

Natural Voices  is AT&T's state-of-the-art TTS product. Its starts with a database of high-quality recorded speech produced under optimum conditions with high-quality recording equipment. The individual sounds in the speech (called phonemes) are carefully labeled so that when a new word or sentence is required, the algorithms can select the best set of sounds to retrieve from the database, joining them together to be spoken. Knowing how to do this effectively is hard, and much of our research is devoted to improving these algorithms to achieve even more natural-sounding TTS in the future.

To try AT&T Natural VoicesTM, go to the demo page and enter text. The words you type are transmitted to an AT&T server running Natural Voices so you can hear your words spoken. Natural Voices supports English (both US and UK versions), German, Spanish, and French.

Read more about the evolution of the Natural Voices technology from the article "Mathmematics of . . . Artificial Speech" in Discover magazine.



External Project Site

Project Members

Horst Schroeter

Ann Syrdal

Alistair Conkie

Yeon-jun Kim

Mark Beutnagel

Taniya Mishra

Related Projects

Project Space

Omni Channel Analytics

AT&T Application Resource Optimizer (ARO) - For energy-efficient apps

Assistive Technology

CHI Scan (Computer Human Interaction Scan)

CoCITe – Coordinating Changes in Text

Connecting Your World



E4SS - ECharts for SIP Servlets

Scalable Ad Hoc Wireless Geocast

AT&T 3D Lab

Graphviz System for Network Visualization

Information Visualization Research - Prototypes and Systems

Swift - Visualization of Communication Services at Scale

Smart Grid

Speech Mashup

Speech translation

StratoSIP: SIP at a Very High Level


Content Augmenting Media (CAM)

Content-Based Copy Detection

Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)

Content Analytics - distill content into visual and statistical representations

MIRACLE and the Content Analysis Engine (CAE)

Social TV - View and Contribute to Public Opinions about Your Content Live

Visual API - Visual Intelligence for your Applications

Enhanced Indexing and Representation with Vision-Based Biometrics

Visual Semantics for Intuitive Mid-Level Representations

eClips - Personalized Content Clip Retrieval and Delivery

iMIRACLE - Content Retrieval on Mobile Devices with Speech

AT&T WATSON (SM) Speech Technologies

Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization