Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP

hp.com home


Research Ideas


printable versionprintable version
» 

HP Labs

» Research
» News and events
» Technical reports
» About HP Labs
» Careers @ HP Labs
» People
» Worldwide sites
» Downloads
Content starts here

This is a collection of research ideas I have for my spare time. They are typically unrelated to my day job, and progress is generally slow and sporadic.

lmbench
A micro-benchmark suite
search
An information retrieval package
Asynchronous File I/O

lmbench

lmbench is an ongoing project to develop a suite of microbenchmarks to analyze various important aspects of system performance.

lmbench version 1 was developed by Larry McVoy and provided latency and bandwidth measurements for many important aspects of system performance.

lmbench version 2 introduced a new timing harness which provided the means for lmbench to make accurate measurements in a platform independent fashion, and it was developed jointly by Carl Staelin and Larry McVoy. The timing harness analyzed the system clock to determine how long experiments needed to be repeated in order to minimize timing errors induced by clock resolution. In addition, some new benchmarks were introduced, and at least one benchmark (mhz) was rewritten from scratch.

lmbench version 3 introduced (yet another) new timing harness to measure system performance under scalable load, which is primarily relevent for multi-processor machines. In addition, lmbench version 3 introduced a number of new benchmarks which attempt to characterize various aspects of the hardware performance, such as the TLB size, cache line size, and memory subsystem parallelism.

cache

cache attempts to determine various attributes of the memory hierarchy, such as the number of caches, the size of each cache, the line size for each cache, the average memory latency for cache hits, and the available memory parallelism to each cache. It utilizes routines from line and par_mem to determine the cache line size and memory parallelism.

The biggest new challenge presented by cache is to determine the number and sizes of the various caches in the memory hierarchy. The obvious approach is to measure memory latency for a range of memory sizes. In an ideal world the graph would look like a series of plateaus separated by sharp cliffs. The cliff edges would indicate cache boundaries. Unfortunately, this is not an ideal world, so there are a number of hardware/OS combinations which produce memory graphs with gently sloping regions instead of sharp cliffs. This makes it very hard to determine the cache edge because in principle it could reside anywhere in the sloping region.

There are apparently a number of factors which can cause these sloping regions. One primary cause is the fact that larger caches are often direct mapped, or have a low degree of associativity, which makes them prone to page collisions. Some operating systems are sensitive to these issues and use page coloring algorithms to reduce cache conflicts. However, if page coloring is not used, then the average memory latency for dereferencing a pointer chain which should fit into cache but has conflicting pages can be significantly larger than a cache hit since it will include all the cache misses due to conflicting pages.

To reduce the impact of this problem I am investigating the use of genetic algorithms to select pages from a larger pool of pages with the aim of choosing a page set which minimizes cache collisions. Initial results indicate that this is a promising approach, and it dramatically improves cache's ability to precisely locate cache boundaries. However, there are still some hardware/OS combinations with sloped memory latency graphs, indicating that there might be additional complications.

File indexing and retrieval package

Initially developed as a simple demonstration of information retrieval techniques for CS236601 based on the description of the Google data structures in The anatomy of a large-scale hypertextual web search engine, I am slowly in the process of turning it into a more complete, high performance indexing and retrieval engine.

The initial implementation was able to index data very rapidly and as far as possible used the data structures from Brin without modification. It had a number of interesting features that improved the indexing performance. First of all, it used multiple threads to read data from disk to make better use of technologies such as multi-disk storage devices and disk scheduling. Secondly, it accumulated the forward index in a large memory buffer (128MB on a 512MB machine), and when the buffer was filled, it used qsort to create the fragmentary inverted index, which was then written to disk. The last stage was sequentially streaming the various inverted index fragments and merging them into a single inverted index. The output buffers were double-buffered so the system could continue streaming input data while sorting and flushing the (full) output buffer.

There are a few areas currently being developed:

Incremental updates
Add the ability to rapidly update the database by adding, removing, and updating documents
Multiple inverted indexes
It is not necessary, or even cost effective to continually modify a monolithic inverted index, particularly for very dynamic collections such as one's email. Instead it may make sense to have a few inverted indexes, creating new ones with new or updated content, and occassionally merge or update them using strategies similar to memory garbage collection.
Parsing
The demonstration code used a very simple, hand coded parser which is unspeakably primitive. A more general and flexible solution is desperately required.
Merging with Python
One solution is to create a very simple token-based interface which allows a few simple operations, and to push all the parsing, stemming, and other word manipulation operations into the high-level scripting language, such as python.
Threading
In order for the system to become "real", it must be made more robust, in particular it should be able to have multiple applications reading and writing the database in parallel (safely).

Asynchronous File I/O

Modify one or more applications (e.g., glimpseindex, find, squid) to utilize the asynchronous I/O and measure application performance improvements.

Last modified: $Date: 2001/11/25 07:33:12 $
Privacy statement Using this site means you accept its terms Feedback to HP Labs
© 2008 Hewlett-Packard Development Company, L.P.