HP Labs Technical Reports
Click here for full text:
An Adaptive, Load Balancing Parallel Join Algorithm
Singh, Vineet; Amin, Minesh; Schneider, Donovan
Abstract: Many parallel join algorithms have been proposed in the last several years. However, most of these algorithms require that the amount of data to be joined is known in advance in order to choose the proper number of join processors. This is an unrealistic assumption because data sizes are typically unknown, and are notoriously hard to estimate. We present an adaptive, load-balancing parallel join algorithm called PJLH to address this problem. PJLH efficiently adapts itself to use additional processors if the amount of data is larger than expected. Furthermore, while adapting, it ensures a good load balancing of data across the processors. We have implemented and analyzed PJLH on a main memory database system implemented on a cluster of workstations. We show that PJLH is nearly as efficient as an optimal algorithm when the amount of data is known in advance. Furthermore, we show that PJLH efficiently adapts to use additional join processors when necessary, while maintaining a balanced load. This makes PJLH especially well-suited for processing multi-join queries where the cardinalities of intermediate relations are very difficult to estimate.
Back to Index