Intern profile – Zhuoyao Zhang

Principal Scientist Lucy Cherkasova and Intern Zhuoyao Zhang.

This is a series of profiles featuring interviews with some of this year's crop of summer interns at HP Labs.

We continue the series with an interview with Zhuoyao Zhang who was recruited by the Information Analytics Lab.

Summer intern Haiyi Lobby
Intern Zhuoyao Zhang.

Zhuoyao Zhang grew up in Wuxi, near Shanghai, where she attended Fudan University as an undergraduate. She’s now studying for a PhD in Computer and Information Science at the University of Pennsylvania. Both places are known for their intensely humid summers, so Zhang’s been enjoying interning in Palo Alto’s mellower climate. “It’s the most comfortable summer I can remember,” she jokes.  Zhang came to HP to work with mentor Lucy Cherkasova in the Information Analytics Lab.  When she’s not pursuing her research, Zhang likes hiking, playing ping pong and watching movies.

HP: What area of research are you looking at?
I’m looking at performance modeling of Pig programs, which in essence represent dags of MapReduce jobs. Pig system is an open source platform for analyzing large data sets. Pig allows you to write simple programs to express complex analytic tasks.  When you run such a program, you want to be able to estimate when it’s going to be completed if you devote a specific amount of resources to that program. You also want to be able to take a specific Pig program and pick a time by which you’d like it completed and then be able to accurately estimate what level of resource allocation that would require. I’ve been trying to create models that can help with both kinds of estimates.

HP: What’s the broader value of creating those models?
This summer project is part of an HP Labs project “SLO-driven Hadoop” Hadoop is a popular open-source platform for efficient “Big Data” processing.
The project aims to design and implement a set of novel performance models and workload analysis tools for automated performance management of MapReduce jobs.

These models are especially important when you are running programs on large distributed clusters where there are multiple users and where you want be able to allocate the available resources as efficiently as possible between them.

HP: Any results you can share yet?
We’re pretty pleased with what we’ve found so far. We’ve tested the models’ estimates against the completion times for benchmark tests and they’ve matched quite well. I’m thinking about continuing this even after my internship. I think there’s some real value in the work.

HP: How did you hear about HP’s internship program?
I first contacted Lucy last year when I had a paper accepted to   Middleware’2010 conference in India and Lucy was the chair of the Industrial Track. We found that we had similar research interests and wanted to find a way to collaborate. Then I got this internship, so we’ve been able to work together, which has been a very nice experience.

HP: Anything else that’s struck you about interning at HP Labs?
It’s a great program. There’re a lot of social events and I’ve made a lot of new friends. Interning here has also offered me a new perspective. I thought that I might go into academia, but I can see how research labs could offer a lot of very good opportunities.