SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs
Verma, Abhishek; Cherkasova, Ludmila; Campbell, Roy H.
Keyword(s): MapReduce; Hadoop; performance models; completion time prediction; resource allocation
Abstract: There is an increasing number of MapReduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that need to be completed within a given time window. Currently, there is a lack of performance models and workload analysis tools available to system administrators for automated performance management of such MapReduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of MapReduce jobs. First, we propose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger dataset. The job profile (with scaling factors) forms the basis of a MapReduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a MapReduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications on the 66-node Hadoop cluster.
Additional Publication Information: Will appear in Proc. of the 5th Workshop on Large Scale Distributed Systems and Middleware ( LADIS'2011), held in conjunction with VLDB'2011, Seattle, Washington, Sept. 2-3, 2011.
External Posting Date: August 21, 2011 [Fulltext]. Approved for External Publication
Internal Posting Date: August 21, 2011 [Fulltext]