 |
|
 |
MapReduce and Hadoop represent an economically compelling alternative
for efficient large scale data processing and advanced analytics in
the enterprise. There is an increasing number of MapReduce
applications associated with live business intelligence that require
completion time guarantees (SLOs). There is a lack of performance models and
workload analysis tools for automated performance management of such
MapReduce jobs. None of the existing Hadoop schedulers support
completion time guarantees. A key challenge in shared MapReduce
clusters is the ability to automatically tailor and control resource
allocations to different applications for achieving their performance
SLOs.
SLOs stands for Service Level Objectives and routinely used for defining a set of performance goals.
We have a few research threads
that we pursue. They are inter-related through the set of performance
tools and models that we have designed: our MapReduce job profiling
approach and a set of novel performance models.
SLO-based scheduler for Hadoop
In this work, we propose a framework, called ARIA, to address this
problem. It comprises of three inter-related components. First, for a
production job that is routinely executed on a new dataset, we build a
job profile that compactly summarizes critical performance
characteristics of the underlying application during the map and
reduce stages. Second, we design a MapReduce performance model, that
for a given job (with a known profile) and its SLO (soft deadline),
estimates the amount of resources required for job completion within
the deadline. Finally, we implement a novel SLO-based scheduler in
Hadoop that determines job ordering and the amount of resources to
allocate for meeting the job deadlines. We validate our approach using
a set of realistic applications. The new scheduler effectively meets
the jobs' SLOs until the job demands exceed the cluster resources.
The results of the extensive simulation study are validated through
detailed experiments on a 66-node Hadoop cluster.
Right-Sizing of Resource Allocation for
MapReduce Apps
Cloud computing offers an attractive option for businesses to rent a
suitable size Hadoop cluster, consume resources as a service, and pay
only for resources that were utilized. One of the open questions in
such environments is the amount of resources that a user should lease
from the service provider. In this work, we outline a novel framework
for SLO-driven resource provisioning and sizing of MapReduce
jobs. First, we propose an automated profiling tool that extracts a
compact job profile from the past application run(s) or by executing
it on a smaller data set. Then, by applying a linear regression
technique, we derive scaling factors to accurately project the
application performance when processing a larger dataset. Moreover,
we design a model for estimating the impact of node failures on a job
completion time to evaluate worst case scenarios.
MapReduce Simulator SimMR
To ease the task of evaluating and comparing different provisioning
and scheduling approaches in MapReduce environments, we have designed
and implemented a simulation environment SimMR which is comprised of
three inter-related components: i) Trace Generator that creates
a replayable MapReduce workload; ii) Simulator Engine that
accurately emulates the job master functionality in Hadoop; and
iii) a pluggable scheduling policy that dictates the scheduler
decisions on job ordering and the amount of resources allocated to
different jobs over time.
Optimizing the Schedule of MapReduce Jobs to
Minimize Their Makespan and Improve Cluster
Performance.
We consider a subset of the production workload that
consists of MapReduce jobs with no dependencies. We observe that the
order in which these jobs are executed can have a significant impact
on their overall completion time and the cluster resource
utilization. Our goal is to automate the design of a job schedule that
minimizes the completion time (makespan) of such a set of MapReduce
jobs. We offer a novel abstraction framework and a heuristic, called
BalancedPools, that efficiently utilizes performance properties of
MapReduce jobs in a given workload for constructing an optimized job
schedule. Simulations performed over a realistic workload demonstrate
that 15%-38% makespan improvements are achievable by simply processing
the jobs in the right order
Meeting Service Level Objectives of Pig
Programs
We consider the popular Pig framework that
provides a high-level SQL-like abstraction on top of MapReduce engine
for processing large data sets. Programs written in such frameworks
are compiled into directed acyclic graphs (DAGs) of MapReduce jobs.
We aim to solve the resource provisioning problem: given a Pig program
with a completion time goal, estimate the amount of resources (a
number of map and reduce slots) required for completing the program
with a given (soft) deadline. We develop a simple yet elegant
performance model that provides completion time estimates of a Pig
program as a function of allocated resources. Then this model is used
as a basis for solving the inverse resource provisioning problem for
Pig programs.
Related Papers and Reports
- Z. Zhang, L. Cherkasova, A. Verma, B. T. Loo:
Automated Profiling and Resource Management of Pig Programs for Meeting
Service Level Objectives.
Proc. of the 9th IEEE International Conference on
Autonomic Computing (ICAC'2012), Sept. 14-18, 2012, San Jose, CA, USA, Best Student Paper award.
- A. Verma, L. Cherkasova, R. Campbell: Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance.
Proc. of the 20th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems
(MASCOTS'2012), Washington DC, USA, August 7-9, 2012.
- Z. Zhang, L. Cherkasova, A. Verma, B. T. Loo: Optimizing Completion Time and Resource Provisioning of Pig Programs. Proc. of Cloud Computing Optimization Workshop (CCOPT'2012), collocated with CCGrid'2012, May 13-16, 2012, Ottawa, Canada.
- A. Verma, L. Cherkasova, V. S. Kumar, R. Campbell: Deadline-based
Workload Management for MapReduce Environments: Pieces of the
Perfromance Puzzle. Proc. of the IEEE/IFIP Network
Operations and Management Symposium (NOMS'2012), Maui, Hawaii,
USA, April, 16-20, 2012.
- Z. Zhang, L. Cherkasova, A. Verma, B. T. Loo: Meeting Service Level Objectives of Pig Programs. Proc. of the 2nd Intl Workshop on Cloud Computing Platforms (CloudCP'2012), in conjunction with EuroSys'2012, Bern, Switzerland, April 10, 2012.
- A. Verma, L. Cherkasova, R. Campbell: Resource Provisioning
Framework for MapReduce Jobs with Performance Goals.
Proc. of the ACM/IFIP/USENIX 12th International Middleware Conference
(Middleware'2011), Lisboa, Portugal, December 12-16, 2011.
- A. Verma, L. Cherkasova, V. S. Kumar,
R. Campbell: Three
Pieces of the MapReduce Workload Management Puzzle. Poster at the
23d ACM Symposium on Operating System Principles (SOSP'2011),
Cascais, Portugal, Oct. 23-26, 2011. Presentation
Poster pdf
- A. Verma, L. Cherkasova, R. Campbell: Play It Again, SimMR!
Proc. of the IEEE Cluster 2011 (Cluster'2011), Austin, Texas, USA, September 26-30, 2011.
- A. Verma, L. Cherkasova, R. Campbell: SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs. Proc. of the 5th Workshop on Large Scale Distributed Systems and Middleware (LADIS'2011), held in
conjunction with VLDB'2011, Seattle, Washington, Sept. 2-3, 2011.
- A. Verma, L. Cherkasova, R. Campbell: ARIA: Automatic Resource Inference a
nd Allocation for MapReduce Environments.
Proc. of the 8th IEEE International Conference on
Autonomic Computing (ICAC'2011), June 14-18, 2011, Karlsruhe, Germany.
- L. Cherkasova: Performance Modeling in MapReduce
Environments: Challenges and Opportunities. Invited Talk at the 2nd ACM/SPEC International Conference on Perfromance Engineering (ICPE'11), March 14-16, 2011, Karlsruhe, Germany.
HP Labs Reports
- Z. Zhang, L. Cherkasova, A. Verma, B. T. Loo:
Automated Profiling and Resource Management of Pig Programs for Meeting
Service Level Objectives. HP Laboratories Report No. HPL-2012-146, 2012.
- A. Verma, L. Cherkasova, R. Campbell: Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance. HP Laboratories Report No. HPL-2012-127, 2012.
- A. Verma, L. Cherkasova, V. S. Kumar, R. Campbell: Deadline-based Workload Management for MapReduce Environments: Pieces of the Perfromance Puzzle. HP Laboratories Report No. HPL-2012-82, 2012.
- A. Verma, L. Cherkasova, R. Campbell: Resource Provisioning
Framework for MapReduce Jobs with Performance Goals.
HP Laboratories Report No. HPL-2011-173, 2011.
- A. Verma, L. Cherkasova, R. Campbell: Play It Again, SimMR! HP Laboratories Report No. HPL-2011-127, 2011.
- A. Verma, L. Cherkasova, R. Campbell: SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs. HP Laboratories Report No. HPL-2011-126, 2011.
- A. Verma, L. Cherkasova, R. Campbell: ARIA: Automatic Resource Inference and Allocation for MapReduce Environments. HP Laboratories Report No. HPL-2011-58, 2011.
|