1 Growth and Challenges of Cloud Storage
Cloud storage for both consumers and enterprises is growing at a phenomenal rate, fueled by the growth in Big Data, Mobile, and Social Networks. It is becoming a major cost component in the cloud infrastructure and any Web-scale cloud services today. While raw storage is cheap, the performance and data durability requirements of cloud storage frequently dictate sophisticated, multi-tier, geo-distributed and managed storage solutions.
Traditional storage systems use dedicated storage hardware and networking to guarantee that the storage QoS requirements such as throughput, latency, IOPS, and reliability are preserved. Unfortunately, these dedicated resources are frequently underutilized. Cloud computing promises efficient resource utilization by allowing multiple tenants to share the underlying networking, computing, and storage infrastructure, but it is difficult to provide end-to-end storage QoS guarantees to individual tenants due to the noisy neighbor problem.
In a cloud environment like OpenStack, the backend block storage (such as LVM, Ceph RADOS block device, or vendor storage appliances) is shared by multiple tenants through a storage virtualization layer like Cinder that attaches virtual machines to individual storage volumes. It is difficult to provide customized storage QoS to meet different tenant needs with a fixed backend where important design decisions such as the replication level, compression, de-duplication, and encryption are already made.
Finally, many cloud infrastructure service providers are moving to scale-out solutions based on commodity hardware, instead of expensive storage appliances, which are frequently more expensive and difficult to adapt to changing workloads or specific QoS requirements. Any cloud solution architect must understand the tradeoffs among performance, reliability, and cost of cloud storage to provide an effective overall solution.
2 Software-Defined Storage
Software-Defined Networks (SDN) aims to virtualize networking resources and separate the control plane from the data plane. Similarly, most Software-Defined Storage (SDS) solutions aim to separate storage hardware from the storage management software, which keeps the intelligence and can be dynamically reconfigured to adapt to changing and growing storage needs.
Unfortunately, unlike SDN, there is not a clear definition on what the core functionalities of software-defined storage really are -- even though many storage vendors claim that they have SDS solutions. Here we summarize the key principles that we believe are pertinent to multi-tenant cloud storage solutions and we call them the C.A.M.P.S. principles of SDS:
We can now combine these principles and give our own definition of SDS: SDS automatically maps customized and evolving storage service requirements to a scalable, elastic and policy-managed cloud storage service, with abstractions that mask the underlying storage hardware and software complexities.
We believe that an SDS solution based on these principles can meet many of the current cloud storage challenges. It is also in line with the AT&T Domain 2.0 principle, which is to use real-time orchestration of cloud resources to meet customer needs at low cost. We in the Cloud Technologies and Services Research have built an SDS solution that is designed to meet the above SDS principles. The SDS solution consists of three layers as shown in Figure 1: the bottom layer consists of raw compute, networking and storage resources that can be scaled out easily as needed. The storage can be locally attached or accessed through networks as long as it provides block storage. The second layer, CloudQoS, allows network bandwidth reservation capabilities through Tegu, storage bandwidth reservations through IOArbiter, CPU capacity reservation through Bora, and resource placement optimization through Ostro. The third layer is the SDS storage layer. SDS automates storage engineering and builds specific storage systems to meet different tenant’s storage QoS needs. The SDS process described below allows tenants with very diverse workloads, such as a random workload using the Ceph object storage, and a sequential workload such as a Big Data application using HDFS or QFS, to co-exist with efficient cloud resource utilization.
Figure 1: SDS Architecture Layers in an OpenStack Cloud
Figure 2 shows a high level view of our SDS automation process. The tenant specifies the storage QoS requirements through a Web interface. The SDS planner for each specific storage system takes the requirements, searches through a large design space to get the best performance, reliability, and cost tradeoffs and then generates an OpenStack Heat template to orchestrate the storage system deployment. The whole process takes only about 10 minutes for a typical TB-level storage system deployment in an OpenStack cloud. Finally, an SDS monitor and visualizer continues to monitor the storage system. It handles failures and reconfigures the storage system if needed to meet growing or shrinking storage demands.
Figure 2: SDS Automates the Storage Engineering Process in an OpenStack Cloud
Incidentally, erasure coding is a crucial technology that can be used to meet the SDS customization requirements. Erasure coding divides a le into k data chunks, and then expands it into n chunk, where n = k + m, and m is the number of parity chunks. Any k out of n chunks is sufficient to reconstruct the file. For a fixed k, varying m, the number of parity chunks, increases the reliability and replication factor (and hence the storage cost). At the same time, it increases the overall encoding/decoding time, and hence the required computation capacity, and perhaps reduced performance. Erasure-coded storage opens up a large design space for performance, reliability, and cost tradeoffs that was not possible with traditional triple-redundancy storage that is commonly used in HDFS and the Swift Object Storage. The SDS planner takes the storage QoS requirements, looks at performance benchmarks of different erasure code choices if needed, and then picks particular erasure code parameters (k and m) to meet the minimal reliability and performances requirements with the least amount of storage overhead.
3 Current SDS Plans
There are several on-going SDS initiatives, which are outlined below, to bring our SDS approach to a level demanded by large-scale enterprise storage:
The rapid growth of cloud storage has created challenges for storage architects to meet the diverse performance and reliability requirements of dierent customers, while controlling the cost, in a multi-tenant cloud environment. We have been working on a software-defined storage solution that has the potential to address the challenges and open up new opportunities for innovations. This SDS solution is in line with the Domain 2.0 principle: real-time orchestration of cloud resources to meet customer demands at low cost.