The AT&T Visual API is a network service providing Visual Intelligence to enterprise and mobile applications by transforming content into textual and indexed representations.
Over the last two decades, AT&T has continued to develop world-class content analysis and computer vision technology. Video segmentation, album management, audio and video near-copy detection, face detection and recognition, natural language-based search, and quality assessment are just a handful of exposed capabilities.
The goal of the VisualAPI is making Visual Intelligence easy for new applications. Deploy your application using visual intelligence to provide capabilities that your customers or business never imagined. Automated enterprise content management, a face-based time-card system, quality comparisons of video codecs, or a speech-driven interface to your customer support videos are all possible with AT&T's Visual API.
This brief video illustrates a few analytical capabilities that visual intelligence provided by the Visual API can bring to your application or enterprise.
Hardened HTTP(s) access and network services mean your application can run on mobile devices, desktops, or back-ends. As part of the VisualAPI release, a live HTML5 SDK has been created that walks new developers through each module and several interesting use-cases. With the VisualAPI, you provide the content and it provides the visual intelligence.
Getting started with the Visual API requires only a few steps.
This section will be periodically updated with sample applications that are included with the Visual API SDK distribution. Striving for highest platform compatibility, these applications usually require an HTML5 compliant browser, with the first browsers to receive highest priority testing being Chrome and Firefox
The Celebrity Doppelgangers application, powered by the Visual API, demonstrates several capabilities using face-based analysis functions. Demonstrated as part of the AT&T 2014 Developer Summit, the application welcomes participants with a sample of available celebrities. The Visual API is a "bring your own data" platform, so if you chose to develop a similar application using a company employee roster or friends from a social network, the underlying process is unchanged.
The truly engaging part of the application is seeing the recognition results of a submitted face. In this screenshot, two participant are shown with their three highest scoring recognition results. Results can be improved on the application side if the system is given knowledge about a small range of facial identities to consider. For example, with no prior knowledge, recognition will be performed across an entire namespace. With prior knowledge, that recognition can be constrained to pick the highest scoring results from a set of only a few considerations. This use may be best suited for the application where a group is known, like determining identity among family members for a home- or in-car-based solution.
Visual quality is a subjective and often challenging metric to quantify, and yet it's importance in a video network service is increasingly important. The Visual API exposes almost 15 subjective quality metrics (with more added continually) that characterize the content of an image or video directly. In the example below, a diagnostic application, called the "Scope", plots blockiness, blur, and noise quality measures for several copies of the same image with different distortions. This illustration calls out the fact that the original (left-most) image has quality charateristics that differ from all other distorted copies. This information could be used in an automated system or as feedback for a user-generated piece of content to alert someone of a potential content quality concern.
The Visual API is grouped into the functional modules below, but content can be analyzed by each resource independently depending on the speed, resolution, and capacity requirements of your application.
Create segregated and secure account credential for your user base. As a mobile or enterprise application developer, you won't loose cycles mapping internal and external references to the Visual API.
Organize content into namespaces that can be shared across users in either read-only or fully editable fashions. This functionality supports crowd-sourced projects that include experts and observers alike.
Video segmentation analyzes your content for logical scene and shot partitions. This function helps to automate editing by bootstrapping segments for telepresence calls, conferences, broadcast video, and even user-generated content.
General content analysis for basic similarity, collection creation, and general library management. If you don’t need specific tools, but want the most of your content, one suite of functions does it all for you.
Detection and recognition of faces from custom or pre-created contexts. Whether your content is a celebrity photo, a user's photo album, or a corporate employee roster, face-based content organization can help.
Near-copy functionality can be used to both identify a low-quality originals (e.g., matching to an online library), monitor for repeeated segments, and help regulate rights infringements utilizing both audio and visual cues.
A light-weight version of near-copy detection for similar photos or videos from different moments in time, camera angles, and illuminations to recommended albums or virtual stacks.
Label faces with human input to bootstrap automatic tagging and subsequent retrieval for a wealth of applications in recognition, entertainment, and retrieval.
Automated perceptual quality scoring of frames, video, and regions based on leading research metrics. These scores go beyond simple energy metrics defined in international specifications to give you precise information about quality degradation as perceived by a human.
Custom speech models built for AT&T WATSON℠ using your content. Define your own categories (genre, model, make, year, etc.) or command grammar and combine them with speech recognition output (no transcripts required) for a precise mapping and indexing scheme.
Supporting the popularity of mix-and-match systems, let the Visual API augment your existing content metadata. Injected metadata supports certain functions (e.g., NLU) without providing content for full analysis.
* These functionalities exist in the AT&T CAE℠ but their deployment into the Visual API is still ongoing. If you are interested in one of these technologies, please ask us about how to get it delivered to you more quickly!