Visual API - Visual Intelligence for your Applications

What is the Visual API?

The AT&T Visual API is a network service providing Visual Intelligence to enterprise and mobile applications by transforming content into textual and indexed representations.

Over the last two decades, AT&T has continued to develop world-class content analysis and computer vision technology. Video segmentation, album management, audio and video near-copy detection, face detection and recognition, natural language-based search, and quality assessment are just a handful of exposed capabilities.  

The goal of the VisualAPI is making Visual Intelligence easy for new applications. Deploy your application using visual intelligence to provide capabilities that your customers or business never imagined. Automated enterprise content management, a face-based time-card system, quality comparisons of video codecs, or a speech-driven interface to your customer support videos are all possible with AT&T's Visual API.

  • No additional software to develop and deploy to utilize the functionality of the Visual API.
  • As a network service, capabilities are available to mobile, desktop, enterprise, and back-end solution providers.
  • The Visual API creates visual intelligence for your content, imposing no storage constraints. That means you can store, stream or playback, and re-encode your content with no changes required for your existing solutions.
  • Responses from the Visual API are bandwidth-friendly. Once the content is transmitted to the Visual API, lightweight XML or JSON can be retrieved by any network connected device.



Hardened HTTP(s) access and network services mean your application can run on mobile devices, desktops, or back-ends. As part of the VisualAPI release, a live HTML5 SDK has been created that walks new developers through each module and several interesting use-cases. With the VisualAPI, you provide the content and it provides the visual intelligence.  

How can I get started?

The public interface of the VisualAPI was officially retired on July 1, 2015, but interested parties are welcome to send an email to staff members highlighted on the right side of this page.


Sample Applications

This section will be periodically updated with sample applications that are included with the Visual API SDK distribution. Striving for highest platform compatibility, these applications usually require an HTML5 compliant browser, with the first browsers to receive highest priority testing being Chrome and Firefox

Celebrity Doppelgangers

The Celebrity Doppelgangers application, powered by the Visual API, demonstrates several capabilities using face-based analysis functions. Demonstrated as part of the AT&T 2014 Developer Summit, the application welcomes participants with a sample of available celebrities. The Visual API is a "bring your own data" platform, so if you chose to develop a similar application using a company employee roster or friends from a social network, the underlying process is unchanged.  


 The truly engaging part of the application is seeing the recognition results of a submitted face. In this screenshot, two participant are shown with their three highest scoring recognition results. Results can be improved on the application side if the system is given knowledge about a small range of facial identities to consider. For example, with no prior knowledge, recognition will be performed across an entire namespace. With prior knowledge, that recognition can be constrained to pick the highest scoring results from a set of only a few considerations. This use may be best suited for the application where a group is known, like determining identity among family members for a home- or in-car-based solution.


Visual API Scope

Visual quality is a subjective and often challenging metric to quantify, and yet it's importance in a video network service is increasingly important. The Visual API exposes almost 15 subjective quality metrics (with more added continually) that characterize the content of an image or video directly. In the example below, a diagnostic application, called the "Scope", plots blockiness, blur, and noise quality measures for several copies of the same image with different distortions. This illustration calls out the fact that the original (left-most) image has quality charateristics that differ from all other distorted copies. This information could be used in an automated system or as feedback for a user-generated piece of content to alert someone of a potential content quality concern.  



The Visual API is grouped into the functional modules below, but content can be analyzed by each resource independently depending on the speed, resolution, and capacity requirements of your application.

Accounts Security

Create segregated and secure account credential for your user base. As a mobile or enterprise application developer, you won't loose cycles mapping internal and external references to the Visual API.

Namespace Partitioning

Organize content into namespaces that can be shared across users in either read-only or fully editable fashions. This functionality supports crowd-sourced projects that include experts and observers alike.

Video Segmentation*

Video segmentation analyzes your content for logical scene and shot partitions. This function helps to automate editing by bootstrapping segments for telepresence calls, conferences, broadcast video, and even user-generated content.

Asset Analysis

General content analysis for basic similarity, collection creation, and general library management. If you don’t need specific tools, but want the most of your content, one suite of functions does it all for you.

Face Detection and Recognition

Detection and recognition of faces from custom or pre-created contexts. Whether your content is a celebrity photo, a user's photo album, or a corporate employee roster, face-based content organization can help.

Near-copy Detection and Recognition*

Near-copy functionality can be used to both identify a low-quality originals (e.g., matching to an online library), monitor for repeeated segments, and help regulate rights infringements utilizing both audio and visual cues.

Collection Creation and Organization

A light-weight version of near-copy detection for similar photos or videos from different moments in time, camera angles, and illuminations to recommended albums or virtual stacks.

Tag Input and Retrieval

Label faces with human input to bootstrap automatic tagging and subsequent retrieval for a wealth of applications in recognition, entertainment, and retrieval.

Perceptual Quality Scoring

Automated perceptual quality scoring of frames, video, and regions based on leading research metrics. These scores go beyond simple energy metrics defined in international specifications to give you precise information about quality degradation as perceived by a human.

Natural Language Understanding (NLU)

Custom speech models built for AT&T WATSON℠ using your content. Define your own categories (genre, model, make, year, etc.) or command grammar and combine them with speech recognition output (no transcripts required) for a precise mapping and indexing scheme.

Metadata Injection*

Supporting the popularity of mix-and-match systems, let the Visual API augment your existing content metadata. Injected metadata supports certain functions (e.g., NLU) without providing content for full analysis.

* These functionalities exist in the AT&T CAE℠ but their deployment into the Visual API is still ongoing. If you are interested in one of these technologies, please ask us about how to get it delivered to you more quickly!

Multimedia (videos, demos, interviews)
null    VisualAPI_Analytics_Sampler (0k)

Project Members

Eric Zavesky

Zhu Liu

Raghuraman Gopalan

David Gibbon

Bernard Renger

Lee Begeja

Behzad Shahraray

Related Projects

Project Space

AT&T Application Resource Optimizer (ARO) - For energy-efficient apps

Assistive Technology

CHI Scan (Computer Human Interaction Scan)

CoCITe – Coordinating Changes in Text

Connecting Your World



E4SS - ECharts for SIP Servlets

Scalable Ad Hoc Wireless Geocast

AT&T 3D Lab

Graphviz System for Network Visualization

Information Visualization Research - Prototypes and Systems

Swift - Visualization of Communication Services at Scale

Smart Grid

Speech Mashup

Omni Channel Analytics

Speech translation

StratoSIP: SIP at a Very High Level


Content Augmenting Media (CAM)

Content-Based Copy Detection

Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)

Content Analytics - distill content into visual and statistical representations

MIRACLE and the Content Analysis Engine (CAE)

Social TV - View and Contribute to Public Opinions about Your Content Live

Enhanced Indexing and Representation with Vision-Based Biometrics

Visual Semantics for Intuitive Mid-Level Representations

eClips - Personalized Content Clip Retrieval and Delivery

iMIRACLE - Content Retrieval on Mobile Devices with Speech

AT&T WATSON (SM) Speech Technologies

Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization