Visual API - Visual Intelligence for your Applications

What is the Visual API?

visualapi_logo

The AT&T Visual API is a network service providing Visual Intelligence to enterprise and mobile applications by transforming content into textual and indexed representations.

Over the last two decades, AT&T has continued to develop world-class content analysis and computer vision technology. Video segmentation, album management, audio and video near-copy detection, face detection and recognition, natural language-based search, and quality assessment are just a handful of exposed capabilities.  

The goal of the VisualAPI is making Visual Intelligence easy for new applications. Deploy your application using visual intelligence to provide capabilities that your customers or business never imagined. Automated enterprise content management, a face-based time-card system, quality comparisons of video codecs, or a speech-driven interface to your customer support videos are all possible with AT&T's Visual API.

  • No additional software to develop and deploy to utilize the functionality of the Visual API.
  • As a network service, capabilities are available to mobile, desktop, enterprise, and back-end solution providers.
  • The Visual API creates visual intelligence for your content, imposing no storage constraints. That means you can store, stream or playback, and re-encode your content with no changes required for your existing solutions.
  • Responses from the Visual API are bandwidth-friendly. Once the content is transmitted to the Visual API, lightweight XML or JSON can be retrieved by any network connected device.

 

This brief video illustrates a few analytical capabilities that visual intelligence provided by the Visual API can bring to your application or enterprise.  

 
Thanks for your interest! Check back soon for the Visual API sampler video!

 

Architecture

Hardened HTTP(s) access and network services mean your application can run on mobile devices, desktops, or back-ends. As part of the VisualAPI release, a live HTML5 SDK has been created that walks new developers through each module and several interesting use-cases. With the VisualAPI, you provide the content and it provides the visual intelligence.  

header_api

How can I get started?

Getting started with the Visual API requires only a few steps.

  1. Preview the Visual API and SDK. Take a quick look at the HTML5 SDK, an interactive and live document set for the entire Visual API, and the Visual API home for downloads.
  2. Register for access to the Alpha APIs hosted in the Developer program. Although the Visual API is still new to the API program, the algorithms and technology in its engine have been continuously incubated for years. Registering in the developer program will allow you to create sample applications and explore the catalog of APIs available.
  3. Explore the Visual API service, discover and contribute to discussions, and experiment with native API calls made through the web page directly.
  4. Integrate your choice of language components (e.g. python or javascript) SDK for your application with wrapper scripts that make utilization a breeze. With the capability to integrate visual intelligence with your application or enterprise, you're ready to design use cases previously unimagined!

 

 

Sample Applications

This section will be periodically updated with sample applications that are included with the Visual API SDK distribution. Striving for highest platform compatibility, these applications usually require an HTML5 compliant browser, with the first browsers to receive highest priority testing being Chrome and Firefox

Celebrity Doppelgangers

app_doppelganger_2 The Celebrity Doppelgangers application, powered by the Visual API, demonstrates several capabilities using face-based analysis functions. Demonstrated as part of the AT&T 2014 Developer Summit, the application welcomes participants with a sample of available celebrities. The Visual API is a "bring your own data" platform, so if you chose to develop a similar application using a company employee roster or friends from a social network, the underlying process is unchanged.  

 

app_doppelganger_1  The truly engaging part of the application is seeing the recognition results of a submitted face. In this screenshot, two participant are shown with their three highest scoring recognition results. Results can be improved on the application side if the system is given knowledge about a small range of facial identities to consider. For example, with no prior knowledge, recognition will be performed across an entire namespace. With prior knowledge, that recognition can be constrained to pick the highest scoring results from a set of only a few considerations. This use may be best suited for the application where a group is known, like determining identity among family members for a home- or in-car-based solution.

 

Visual API Scope

Visual quality is a subjective and often challenging metric to quantify, and yet it's importance in a video network service is increasingly important. The Visual API exposes almost 15 subjective quality metrics (with more added continually) that characterize the content of an image or video directly. In the example below, a diagnostic application, called the "Scope", plots blockiness, blur, and noise quality measures for several copies of the same image with different distortions. This illustration calls out the fact that the original (left-most) image has quality charateristics that differ from all other distorted copies. This information could be used in an automated system or as feedback for a user-generated piece of content to alert someone of a potential content quality concern.   app_scope_quality

 

Functionality

The Visual API is grouped into the functional modules below, but content can be analyzed by each resource independently depending on the speed, resolution, and capacity requirements of your application.

Accounts Security

Create segregated and secure account credential for your user base. As a mobile or enterprise application developer, you won't loose cycles mapping internal and external references to the Visual API.

Namespace Partitioning

Organize content into namespaces that can be shared across users in either read-only or fully editable fashions. This functionality supports crowd-sourced projects that include experts and observers alike.

Video Segmentation*

Video segmentation analyzes your content for logical scene and shot partitions. This function helps to automate editing by bootstrapping segments for telepresence calls, conferences, broadcast video, and even user-generated content.

Asset Analysis

General content analysis for basic similarity, collection creation, and general library management. If you don’t need specific tools, but want the most of your content, one suite of functions does it all for you.

Face Detection and Recognition

Detection and recognition of faces from custom or pre-created contexts. Whether your content is a celebrity photo, a user's photo album, or a corporate employee roster, face-based content organization can help.

Near-copy Detection and Recognition*

Near-copy functionality can be used to both identify a low-quality originals (e.g., matching to an online library), monitor for repeeated segments, and help regulate rights infringements utilizing both audio and visual cues.

Collection Creation and Organization

A light-weight version of near-copy detection for similar photos or videos from different moments in time, camera angles, and illuminations to recommended albums or virtual stacks.

Tag Input and Retrieval

Label faces with human input to bootstrap automatic tagging and subsequent retrieval for a wealth of applications in recognition, entertainment, and retrieval.

Perceptual Quality Scoring

Automated perceptual quality scoring of frames, video, and regions based on leading research metrics. These scores go beyond simple energy metrics defined in international specifications to give you precise information about quality degradation as perceived by a human.

Natural Language Understanding (NLU)

Custom speech models built for AT&T WATSON℠ using your content. Define your own categories (genre, model, make, year, etc.) or command grammar and combine them with speech recognition output (no transcripts required) for a precise mapping and indexing scheme.

Metadata Injection*

Supporting the popularity of mix-and-match systems, let the Visual API augment your existing content metadata. Injected metadata supports certain functions (e.g., NLU) without providing content for full analysis.

* These functionalities exist in the AT&T CAE℠ but their deployment into the Visual API is still ongoing. If you are interested in one of these technologies, please ask us about how to get it delivered to you more quickly!


External Project Site

Project Members

Eric Zavesky

Zhu Liu

Raghuraman Gopalan

David Gibbon

Amy Reibman

Bernard Renger

Lee Begeja

Behzad Shahraray

Related Projects

Project Space

AT&T Application Resource Optimizer (ARO) - For energy-efficient apps

Assistive Technology

CHI Scan (Computer Human Interaction Scan)

Client Communications Center

CoCITe – Coordinating Changes in Text

CollaboraTV

Connecting Your World

Darkstar

Daytona

E4SS - ECharts for SIP Servlets

Scalable Ad Hoc Wireless Geocast

AT&T 3D Lab

Graphviz System for Network Visualization

Information Visualization Research - Prototypes and Systems

Swift - Visualization of Communication Services at Scale

AT&T Natural VoicesTM Text-to-Speech

Smart Grid

Speech Mashup

Speech translation

StratoSIP: SIP at a Very High Level

Telehealth

Content Augmenting Media (CAM)

Content-Based Copy Detection

Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)

MIRACLE and the Content Analysis Engine (CAE)

Social TV - View and Contribute to Public Opinions about Your Content Live

Enhanced Indexing and Representation with Vision-Based Biometrics

Visual Semantics for Intuitive Mid-Level Representations

eClips - Personalized Content Clip Retrieval and Delivery

iMIRACLE - Content Retrieval on Mobile Devices with Speech

AT&T WATSON (SM) Speech Technologies

Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization