HP Labs Technical Reports

Click here for full text: Postscript PDF

LH*-Linear Hashing for Distributed Files

Neimat, Marie-Anne; Schneider, Donovan A.; Litwin, Witold



Abstract: We propose a data structure called LH*. LH* generalizes Linear Hashing to parallel or distributed RAM and disk files. An LH* file can be created from objects provided by any number of distributed and autonomous clients. It can grow gracefully, one bucket at a time, to virtually any number of servers, and to a size limited only by the capacity of the servers. The number of messages per random insertion is one in general, and three in the worst case. The number of messages per random retrieval of an object given its OID (primary key) is two in general, and four in the worst case. The load factor can be about constant, 65-95%, depending on the file parameters. The file can support parallel operations, e.g., hash joins and scans. Performing a parallel operation on a file of M buckets, costs at most 2M+1 messages, and between 1 and O(log sub2 M) rounds of messages. An LH* file can be much faster than a single site disk file, and/or can hold a much larger number of objects. It can be more efficient than any file with a centralized directory, or a static parallel or distributed hash file.

Back to Index

[Research] [News] [Tech Reports] [Palo Alto] [Bristol] [Japan] [Israel] [Site Map] [Home] [Hewlett-Packard]