Automated Profiling and Resource Management of Pig Programs for Meeting Service Level Objectives
Zhang, Zhuoyao; Cherkasova, Ludmila; Verma, Abhishek; Loo, BoonThau
Abstract: An increasing number of MapReduce applications associated with live business intelligence require completion time guarantees. In this paper, we consider the popular Pig framework that provides a high-level SQL-like abstraction on top of MapReduce engine for processing large data sets. Programs written in such frameworks are compiled into directed acyclic graphs (DAGs) of MapReduce jobs. There is a lack of performance models and analysis tools for automated performance management of such MapReduce jobs. We offer a performance modeling environment for Pig programs that automatically profiles jobs from the past runs and aims to solve the following inter- related problems: (i) estimating the completion time of a Pig program as a function of allocated resources; (ii) estimating the amount of resources (a number of map and reduce slots) required for completing a Pig program with a given (soft) deadline. For solving these problems, initially, we optimize a Pig program execution by enforcing the optimal schedule of its concurrent jobs. For DAGs with concurrent jobs, this optimization helps reducing the program completion time: 10%-27% in our experiments. Moreover, it eliminates possible non-determinism of concurrent jobs' execution in the Pig program, and therefore, enables a more accurate performance model for Pig programs. We validate our approach using a 66-node Hadoop cluster and a diverse set of workloads: PigMix benchmark, TPC-H queries, and customized queries mining a collection of HP Labs* web proxy logs. The proposed scheduling optimization leads to significant resource savings (20%-40% in our experiments) compared with the original, unoptimized solution, and the predicted program completion times are within 10% of the measured ones.
Additional Publication Information: NIP28: 28th International Conference on Digital Printing Technologies and Digital Fabrication (NIP 2012) [NIP 28]
External Posting Date: June 28, 2012 [Fulltext]. Approved for External Publication
Internal Posting Date: June 28, 2012 [Fulltext]