HP Labs cloud-computing test bed: Technical overview

The Blue Marble Image courtesy of NASA Goddard Space Flight Center

 

The open cloud-computing research test bed is designed to support research into the design, provisioning, and management of services at a global, multi-datacenter scale.

The test bed will provide researchers with an unparalleled opportunity to experiment with large-scale deployments of services across multiple continents. It will provide an environment in which to learn first hand about the issues involved in handling hundreds to thousands of nodes that reside in state-of-the-art data centers that are separated by thousands of kilometers.

The open nature of the test bed is designed to encourage research into all aspects of service and datacenter management. In addition, we hope to foster a collaborative community around the test bed, providing ways to share tools, lessons and best practices, and ways to benchmark and compare alternative approaches to service management at data center scale.

 

Challenges

Cloud computing services need to be able to scale almost arbitrarily in a cost-effective manner if they are to reach new technology markets that are too expensive to address today. This requires that they (a) can flex the amount and type of resources they use on the fly as their loads and needs change; (b) must be highly automated, to avoid expensive support personnel; and (c) need to support multiple tenants (i.e., customers or clients) simultaneously.

Today's business environment has evolved to one in which service clients typically demand quality of service and isolation guarantees for performance, failure, and security, to prevent interference from other tenants. Although these needs can often be met by over-provisioning, or by allocating resources for the exclusive use of one tenant, this is rarely cost effective, so new mechanisms are needed that can both deliver against promised service levels, and be trusted to provide the necessary isolation. Indeed, providing such trust is itself a challenge for service providers – one that will need to be solved to make the service-oriented approach truly successful.

It's not enough to find solutions that work in a research lab under controlled conditions – we need to find a set of simple solutions that can be applied en masse to the next generation of cloud-based services. We need to learn how to make them operate in a truly scale-independent fashion, handle failures that emerge from multiple sources, meet their customers' goals for service quality, trustworthiness, and cost.

It is well known that large-scale systems exhibit unexpected behaviors not triggered or observable at the scales typical of most test beds. New failure types show up; new networking pathologies occur; and the management and control system is subjected to the same kinds of difficulties at the same time. The sheer variety of failures makes writing, supporting, and operating large-scale services challenging.

When multiple, geographically-dispersed data centers are introduced into the picture, new kinds of problems – and opportunities – crop up, including disconnected operation, network latency, and entire site failure.

 

Approach

The most important aspect of our approach is that it is open - there is no one "right answer" to the challenges of providing global-scale services, and the tested deliberately refrains from imposing one. Instead, the underlying principle is that of encouraging multiple alternative approaches, and providing an environment in which they can be developed, tried out, and compared.

The test bed is aimed at researchers who want to try things out at a big enough scale that they experience real problems. At the same time, it imposes as little in the way of underlying infrastructure as possible, so that the widest possible range of experiments can be supported.

We have modeled our approach loosely on the one used successfully by PlanetLab: the minimum possible set of functionality is mandated, but firm standards are maintained at this layer. For our test bed, the basic building block is the ability to get an exclusive reservation for a set of physical resources, isolated inside a VLAN ­– a Physical Resource Set, or PRS. Everything above this layer can be controlled and measured by the researcher.

At higher levels, we anticipate providing support for running data-intensive calculations such as Map-Reduce jobs on the open source Hadoop and Pig software from Apache/Yahoo, as well as tools to manage the large datasets these techniques operate on.

 

Services offered

Each site will be encouraged to provide its own implementation of a small set of agreed-upon services: all that we require is that the implementations conform to a common interface. This will encourage experimentation and allow benchmarking different approaches to providing management and software stacks. We also hope to offer instrumentation feeds from several of these common services, in order to foster the next generation of realistic workload tracing for all researchers – even those not participating in the test bed.

We are planning to provide thorough sensor-based instrumentation at several sites, allowing research into power- and thermal-aware data center management, and mechanisms by which resource and load management tools can interact to reduce the energy footprints of future platforms.

As time goes on, we hope that the community will extend this list of services by developing new ones and offering them to others. In the meantime, the open nature of the test bed means that any internet-accessible service – inside or outside the test bed – can be accessed by software running on it.

We believe that our open approach will support a wide variety of interesting research topics, and encourage the collaboration that is so necessary to an endeavor of this scale. Indeed, we hope to find ways to partner with other groups that support local test beds, extending the reach and capabilities of the research platforms available.

 

The Blue Marble Image courtesy of:
NASA Goddard Space Flight Center Image by Reto Stöckli (land surface, shallow water, clouds). Enhancements by Robert Simmon (ocean color, compositing, 3D globes, animation). Data and technical support: MODIS Land Group; MODIS Science Data Support Team; MODIS Atmosphere Group; MODIS Ocean Group Additional data: USGS EROS Data Center (topography); USGS Terrestrial Remote Sensing Flagstaff Field Center (Antarctica); Defense Meteorological Satellite Program (city lights).