VidCat permits simplified personal photo and video management (i.e. a Video Catalog) from a webpage or your favorite mobile device. VidCat builds on technology from the Content Analysis Engine to create simple functions for face detection and similarity computation, stacking similar content, intelligent video editing, and related content navigation. VidCat combines both a back-end analysis server (constructed with network-based services in mind) as well as client software developed for web browsers, Android, and iOS tablets. Personal content can be uploaded directly from the client application or via an online personal content repository, like AT&T Locker. VidCat embodies not only the algorithmic innovations above, but also intelligent management of user metadata, to make the most of a user's mobile bandwidth.
The primary motivator for VidCat is making personal content management easy. The advent of the digital camera brought browsing and organization of photos by time, but the ability to capture thousands of photos and videos with modern mobile devices (both smartphones and tablets) this technique makes management cumbersome and frustrating. For the average social network user, there are about 97 thousand photos to review from friends and family according to a survey in 2011. Online storage (or cloud storage) offered by service providers does help to get photos and videos to your friends and family more quickly, but a clever assembly of content analysis technology can automatically manage and aide discovery of your content even faster. Also, as with many technologies, if you can spend a little time guiding the system through your content, it can help you find photos and people in them more quickly. For example, as you label detected faces in your photos, VidCat will "learn" to better associate similar faces to that label. Looking at photos and videos, VidCat automatically computes similarity of ever photo and video frame to others in your collection. This built-in similarity then allows you to jump from one photo or video to another by similarity in time, visual content, or even the people in the photo.
In this section, screenshots of a few algorithm innovations are showcased through a VidCat prototype. Coordinated work with a user interface design team helps to exemplify the strengths of VidCat approaches over traditional photo management and recognition applications.
1 Face Similarity - On the right, a multi-step example illustrates the simplicity of finding and labeling people in your photos. Upon upload, faces of people in photos and videos are detected by the system. Once uploaded, a labeling mode can highlight all of the detected faces and identify a single face of interest. The face can be labeled by selecting it and typing in a single name. After labeling, the system will suggest a number of visually similar faces from other photos and videos found in your content library. Each repetition of this process provides both visual and textual examples that help VidCat to recommend labels with increasing accuracy.
2 Easy Video Editing - Video editing should be simple. Using a finger or a mouse to "scrub" to the right position in a video is tedious and can be quite inaccurate. VidCat combines automatic video segmentation with server-based playlists to combine photos and videos without rendering or compiling them into a new video; your browser and mobile device can playback videos of just about any format, so why can't you? In the image below, VidCat is used to create a video show, which combines videos and photos without the hassle of digital editing.
First, VidCat allows you to pick the various photos and videos you want to create a new show. This action is not restricted by original file type or resolution because VidCat normalizes these properties on upload. Next, VidCat allows you to shuffle various video scenes and photo frames into whatever order is best. Videos are segmented with scene segmentation algorithms to determine the natural cuts, fades, or camera pans to new content in a video. This automated process removes the burden of editing with a finger or mouse to find just the right place to cut a video. In fact, you can place photos in between different segments of video wherever it feels right because VidCat never alters the original content of of your video. Finally, as you're reviewing your new show, you can instantly enable or disable different scenes and review it again instead of waiting for a re-rendering process.
3 Photo Stacks - On the right, VidCat's photo stacking feature allows you to easily select the best photos from visually similar content. Using a near-duplicate detection algorithm, photos that have similar foreground or backgrounds are automatically grouped together in to a stack. Stacking with visual content goes beyond existing photo management software, which typically uses only time or date information (i.e. photos captured within a few seconds or during a single shutter fire). VidCat creates stacks from similar photos of a person or place from slightly different angles, captured minutes or hours apart, or slightly varied lighting conditions. In VidCat's' user interface, picking the best similar photo (or photos) from a stack of twenty is just a click or tap away.
VidCat's architecture allows developers to select the client-server interaction mode ideal for an application. Whether it's a bandwidth-optimized client application that heavily utilizes client-side caching or a light-weight, browser friendly HTML5 application VidCat's infrastructure and APIs can accommodate.
In the back-end, each instance of VidCat runs its own processing and services API, allowing trivial load-balancing, as supported by most common platforms like OpenStack. Across instances, a distributed datastore saves content information and metadata and within an instance, a temporary file system facilitates faster content analysis. This design allows instances to be added or removed without functional impact on the analysis and retrieval capabilities of VidCat.
Focusing on the fastest and most enjoyable user experience, parts of the VidCat system were also optimized to run with client-side caching. In this mode, repeated requests for content or metadata can be avoided with cached thumbnails, images, video, and internally referenced content on the client. Also, because the photo and face similarity scores in VidCat are computed at upload, a client can synchronize the relevant metadata and still provide a stunning user experience in an off-line setting. VidCat also supports traditional REST-like calls so a new application that uses the VidCat framework can be created in no time.