Fully coordinatedA new approach from HP Labs makes data centers more efficient, stable and cheaper to run
By Simon Firth
Sometimes the path of innovation gets convoluted. “Take the evolution of the data center,” says HP Labs computer scientist Vanish Talwar.
“When you look at data centers today,” he explains, “we have a lot of management solutions running on them -- all solving their own different problems well -- but those solutions don’t coordinate with each other.”
And that’s a problem. Management tools that control cooling, for example, might try to slow a machine down at the same time as tools created to match processing power with demand are ramping it up. With each tool working antagonistically, a data center can quickly become both unstable and expensively inefficient.
To date, engineers have built ad hoc tools to deal with these conflicts. That’s one reason why labor accounts for up to 80% of the cost of running a typical new data center.
HP Labs, however, is now offering a different approach to the problem.
“What we are doing is stepping back, viewing everything as a holistic system, and creating a more structured approach to solving these issues,” says Talwar. “And this results in coordination and federation of the data center management systems.”
Two building blocks for coordinating management tools
Georgia Tech graduate
student Chengwei Wang
"There are two key concepts here," says Partha Ranganathan, the other HP Labs team member and the principal investigator for the broader exascale datacenter project.
“The first is to figure out a way in which these management tools can talk to each other, which we do by creating mechanisms for communication. It’s what we call m-channels.”
M-channels is a service that sits in the data center and can register the software and hardware management tools and offer instructions back to them.
“The second concept,” says Ranganathan, “is about changing the behavior of one tool based on the behavior of another. This is what we call m-brokers.”
Together, these two building blocks create a framework within which data center managers can create solutions for a wide variety of specific coordination problems.
The net advantages of coordinated management
While attractively simple in concept, neither element was easy to implement. “There's a lot of hard research that went into figuring out the protocols and the right levels of channel abstraction,” reports Ranganathan, “and how we apply technology to solve those questions. And then a lot of science went into figuring out the coordination that’s needed to get the brokering part of the equation right.”
When they tested their model in a prototype testbed, however, the results were impressive. Adding a coordinated management system, compared to state-of-the-art, non-coordinated solutions, can achieve 54% better stability -- a 71% improvement in meeting contractually agreed levels of service (SLAs) and a 10% decrease in power consumption -- the teams has shown.
The system can be built into existing data centers and has applicability in many different use cases. And because it is relatively simple to use, it lets data center managers write coordination policies without having to deal with the complexity of the hardware and software management tools they are seeking to control.
Vanish Talwar adds that it’s even simpler to use, thanks to a software development kit that the Coordinated Management team has developed. “That fits our broader goal of making coordination easy for management developers to really do,” Talwar explains. “And because of an automated infrastructure and easy-to- use interface, the probability that we will address the problem of silos in the data center is that much higher.”
The improved reliability, stability and efficiency that coordinated management offers all result in better performance for every dollar that data center managers spend on technology. And thanks to the time they save by not having to create ad hoc solutions to their specific management issues, they save on labor, too. “Coordination that would have taken you days or months to do before,” suggests Ranganathan, “might now take minutes.”
Expanding and scaling the solution
Talwar and Ranganathan are presenting their work this June to the IEEE International Conference on Autonomic Computing in Barcelona.
They’re planning to continue testing the concept of coordinated management in other use cases and are also looking at applying it to the large-scale processing functions carried out in cloud computing. “There are a lot of open questions in the notion of scalable coordinated management,” Ranganathan acknowledges, “so that's really the next step.”
The team is also sharing its solution with the research community and is curious to see how other people use it.
“The ultimate proof of the pudding,” says Ranganathan, “will be when somebody surprises us by applying it to a problem that we didn't even know existed.”
Vanish Talwar Q&A
Senior Research Scientist
How long have you been at HP Labs?
I first came to HP Labs in 2002 while I was doing my Ph.D. I joined HP Labs full time in 2006.
How would you describe your field of research?
My areas of interest are distributed systems and operating systems. At HP Labs in the last several years, I've been applying that for solving data center management problems.
What's the best part of your job?
It’s that you can have an idea and then go and pursue it. There are great people around HP Labs, too, who can give you great feedback and help you really move forward with your project.
How did you get into this line of research?
I was always interested in science and mathematics, so the more mathematical foundation of computer science attracted me. And then being able to take those foundations and use them to develop codes and prototypes has been really rewarding.