Daytona

empty-graphic
ThinkingInCymbal-440



Daytona-architecture

Daytona is AT&T’s database management system for warehousing immense amounts of data while providing the capability to quickly query and retrieve data within seconds, even from tables containing more than a trillion records. It supports such standard database functionality as SQL, data dictionary, transactions, locking, logging, recovery, and views.  In addition, it offers indefinite size scalability via its compression technology and fully general horizontal partitioning, great speed scalability via its unique SPMD parallelization capabilities, special in-memory data structures, and optional shared memory use and lastly, its own powerful 4GL query language Cymbal, which includes SQL. It does all this in a more streamlined, better engineered way that sets Daytona apart from other database systems in terms of database capacity, speed, query language expressiveness, and ease of use.

Among its distinguishing characteristics is Daytona’s unique and simple architecture. First, there are no database server processes. While most database systems rely on server-based processes for scheduling, file access, locking, caching, networking and other tasks, Daytona employs the operating system alone for these tasks. This avoids the inefficiencies inherent in having both server processes and the operating system redundantly trying to do the same kinds of things at the same time on the same hardware. Thus, Daytona’s architecture enables it to be more compact and more efficient than other database systems.

Data storage is also simplified with Daytona storing its data as UNIX ASCII flat files in standard file systems. Consequently, there’s no need to pay the administrative overhead needed to create special raw disk partitions to hold the data. Furthermore, as simple files, the data remains accessible in a way not possible with other systems that store data using a proprietary, representational form. In particular, when not compressed, standard UNIX tools can operate on Daytona’s data directly. Storing data as flat files also means more efficient tables since records are stored one right after the other.

Daytona-flat-files

In Daytona, data is stored as UNIX flat files, with each line corresponding to a table row and each field separated from others by a simple character (the default is a |). Even comments are supported, in this case by using the % character. Daytona files are viewable (when uncompressed) by vi and other UNIX editors.

Daytona-at-a-glance-sharpen

 

Queries themselves are fast because queries are compiled directly to C code and then to machine executables, which run faster than interpreted queries. (Most database servers first interpret queries into an intermediate representation language.) Once created, the executables can be invoked directly by name.

By using the Single Program Multiple Data (SPMD) parallelization paradigm (not typically used by others), Daytona achieves great speed scalability by ensuring that multiple CPU cores can be employed to produce the answers for a single query. Daytona’s use of this paradigm consists of compiling a single program which, by design, creates k clone child processes, each of which solves 1/kth of the problem and reports back its results to the parent for integration.

For creating sophisticated queries, Daytona has its own powerful high-level, 4GL querying language Cymbal® that includes SQL as a subset. Queries can be written in Cymbal, SQL, or even a combination of the two, giving greater flexibility to write queries specialized for a particular task. Queries can be performed on the data even as it’s being loaded into Daytona. As a 4GL, Cymbal contains a number of constructs that allows it to also be used as a programming language for additional power.

Cymbal uses both declarative and procedural queries even within the same program, as shown in this sample program, which prints:

Yes, 17 is a prime.
Yes, 17 is a prime.

The definition of Is_A_Prime is declarative, and the definition of Is_A_Prime_Too is procedural.

cymbal-code-program

 Cymbal offers a very-high-level, one-of-a-kind way to store tuple-to-tuple associative arrays in UNIX shared memory, while optionally using all of the other capabilities of Cymbal.  Multiple processes can concurrently read and write these associative arrays.  As an example, consider continually maintaining associative array caches in UNIX shared memory that contain user account data that are then used to join with user activity records streaming by.  Processes can also synchronize their access to these arrays in such a way as to pass data (or other messages) as might otherwise be done using pipes (or message queues).  By working in shared memory, the user gains the speed that would otherwise be lost due to disk I/O.

To support access by third party tools such as Business Objects, Daytona offers a JDBC interface by means of its pdq network shell daemon. Daytona also has interfaces for Perl and Python.

Daytona is easy to use. The architecture is easy to understand and easy to administer by anyone with basic knowledge of UNIX. In fact, it can be installed and ready to go in less than 10 minutes. No special expertise in a proprietary system is needed.

                 Daytona project members: Rick Greer, Phil Brown, Albert Algava, and Larry Rose

 

                 All about Daytona guide in pdf
                
               
 

                 Daytona Shared Memory paper in pdf
                
               
 
 

Daytona-Two-Chefs_440

Too many cooks . . . Whereas most database systems employ server processes running on the operating system, Daytona interfaces directly to  the operating system without needing to create database server processes. This avoids the redundancy of the first case where two large programs (the OS and database servers) are performing many of the same tasks at the about same time using the same resources.

 


Project Members

Rick Greer

Philip Brown

Related Projects

Project Space

Omni Channel Analytics

AT&T Application Resource Optimizer (ARO) - For energy-efficient apps

Assistive Technology

CHI Scan (Computer Human Interaction Scan)

CoCITe – Coordinating Changes in Text

Connecting Your World

Darkstar

E4SS - ECharts for SIP Servlets

Scalable Ad Hoc Wireless Geocast

AT&T 3D Lab

Graphviz System for Network Visualization

Information Visualization Research - Prototypes and Systems

Swift - Visualization of Communication Services at Scale

AT&T Natural VoicesTM Text-to-Speech

Smart Grid

Speech Mashup

Speech translation

StratoSIP: SIP at a Very High Level

Telehealth

Content Augmenting Media (CAM)

Content-Based Copy Detection

Content Acquisition Processing, Monitoring, and Forensics for AT&T Services (CONSENT)

MIRACLE and the Content Analysis Engine (CAE)

Social TV - View and Contribute to Public Opinions about Your Content Live

Visual API - Visual Intelligence for your Applications

Enhanced Indexing and Representation with Vision-Based Biometrics

Visual Semantics for Intuitive Mid-Level Representations

eClips - Personalized Content Clip Retrieval and Delivery

iMIRACLE - Content Retrieval on Mobile Devices with Speech

AT&T WATSON (SM) Speech Technologies

Wireless Demand Forecasting, Network Capacity Analysis, and Performance Optimization