Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP

HP.com home

July 2004

Looking for trouble

Research could help spot performance problems in complex systems


HP Labs

» Research
» News and events
» Technical reports
» About HP Labs
» Careers @ HP Labs
» Worldwide sites
» Downloads
Content starts here
It doesn’t tell you how to fix the problem, but it tells you where to look.

by Steve Towns
Editor, HP Government Solutions magazine

Research under way at HP Labs may one day help pinpoint performance bottlenecks in complex computing systems.

Using network-monitoring technology and sophisticated algorithms, HP researchers are attempting to trace the path of a request as it travels through the maze of software that constitutes many IT networks. The technique may reduce finger-pointing on multivendor IT projects and yield better performance for applications.

The research attacks a vexing problem for businesses, government agencies, univerities and others operating large enterprise systems, particularly Web-based applications built of multiple components from different vendors.

Not only are these systems notoriously hard to debug, they';re becoming more common as institutions rely more heavily on Web-based services allow people to buy books, run auctions, request IT help and perform many other kinds of transactions online.

Improving system performance

"These days, people construct applications out of pieces from various manufacturers. They're built from a bunch of computers talking to each other over a network of some sort," said Jeffrey Mogul, an HP Fellow in the company's Internet Systems and Storage Lab. "It's often hard to get these things to work, and you can have a devil of a time figuring out where the problem lies, especially if it's a performance problem.”

Such systems often string together "black boxes" -- widely distributed servers, storage arrays, etc. -- that are difficult or impossible for IT managers to examine closely. Mogul is part of a team of HP Labs researchers developing a method to locate problems and performance issues in complex systems without delving into sophisticated components or scouring source code.

The research is part of an overall effort at HP to maximize the agility, efficiency and reliability of enterprise computing resources. Advanced techniques being explored in HP Labs may result in better performance for heterogeneous computing environments common to large organizations.

Follow the packets

Today, isolating performance problems in intricate applications is extremely time-consuming and can demand integration expertise that's both expensive and hard to find. That's because distributed systems may include front-end Web servers, Web application servers, ERP systems, credit card authorization systems and other technologies. What's more, these separate parts may come from different, perhaps competing, manufacturers.

"You’re probably not going to have source code for most of those things," said Mogul. "Even if you did, it would be too much for any one person to understand."

The diagnostic technique being developed by HP Labs traces the route of network messages as they travel through distributed systems and measures the speed of various tasks performed along the way.

"You have this path of messages through the system, where each message is causing a successor,' Mogul says. For example, a message from a client arrives at a Web server, which sends a message to a back-end applications server, which then might interact with an authentication server.

The idea is to map this chain of events and spot operations that take longer than they should.

Diagnostic tools

"Our hypothesis is that if we can point you to someplace where there often is a lot of latency, than that's the box you should open up to figure out what's going wrong inside, Mogul said. "It doesn't tell you how to fix the problem, but it tells you where to look."

Similar diagnostic tools already are available for homogenous systems, all Java or all .NET applications, for example. It's been more difficult to develop a tool for more complex environments, in part because support and documentation for multivendor components may be difficult to compile. So researchers tried to create something that requires neither support from vendors nor extensive knowledge of system components.

"We decided we needed to do this as noninvasively as possible," Mogul said. "We don’t need to know anything about the application ahead of time, and we don’t inject our own traffic into the system."

Pinpointing problems

They start by looking at the traffic carried by network switching equipment. Modern network switches include a feature called port monitoring, which allows a monitoring system to see a copy of each packet on the network.

Researchers use that information to trace network traffic over a period of time -- anywhere from minutes to hours, depending on how changeable the application is -- saving for each packet only a time-stamp and information about the sender and receiver. They avoid saving the full packet data, to protect data privacy.

Then researchers apply algorithms to the packet trace that allow them to sketch out relationships between network components and spot time lags between operations.

The approach eventually may reduce the time needed to deploy and debug complex systems. It also could give IT professionals better insight into system behavior, allowing them to create more reliable and responsive applications.

Showing Potential

Limited tests of the technique show promise.“Preliminary results suggest we’re on the right track,”Mogul said.“But we haven’t gotten to the point in our research where we’ve taken a complex live system and found out something the owners of the system didn’t already know.”

Their work gained research community attention late last year when a scientific paper based on the project was published at the 19th Association for Computing Machinery Symposium on Operating Systems Principles. The event is the world’s premier forum for researchers and developers working on operating system technology.

“In artificial traces,we’ve been able to reconstruct pictures of network relationships fairly accurately,”said Mogul, adding that he hopes to give the tool a real-world test by unleashing it on some of HP’s internal applications.

Solving problems faster

Products based on this research eventually may reduce the programming expertise needed to isolate performance problems in complex network systems. Or they may allow highly skilled systems integrators to work more efficiently by helping them locate technical glitches quickly and precisely.

That means faster and less expensive resolutions of some of the technology industry’s toughest challenges, and possibly an end to a huge headache for IT managers: the blame game played by multiple vendors involved in complex systems when something goes wrong.

“You have this complicated system, and you’ve basically got it running in the sense that most of the time it’s giving you right answers. But it’s not fast enough, and you’re trying to figure out why,” Mogul said. “By creating tools that isolate performance problems in complex systems, we hope to help solve the finger-pointing problem.”

Related links

» Research project page
» Jeff Mogul

News and events

» Recent news stories
» Archived news stories

Jeffrey Mogul, an HP Fellow in the company's Internet Systems and Storage Lab

Jeffrey Mogul, an HP Fellow in the company's Internet Systems and Storage Lab


Printable version
Privacy statement Using this site means you accept its terms Feedback to HP Labs
© 2009 Hewlett-Packard Development Company, L.P.