Overview
Instruction-specific performance monitors in microprocessors have been extremely beneficial to performance tuning (as witnessed by tools like Caliper) and more recently, to compiler optimizations. The main reason they are so useful is that they explicitly point to problematic instructions in the user's code. Unfortunately, shared-memory multiprocessors do not have this support: they do offer performance monitors, but cannot correlate them with instructions.
The goal of the SWIFT project is (1) to investigate how information on system events can be collected at the granularity of individual instructions, and (2) to design tools and optimizations using this feature.
In a recent study, we showed that knowing which instructions are involved in coherence traffic, together with the judicious use of the Itanium instruction set, can dramatically decrease the number of transactions in real-world applications on Pinnacles. Specifically, we insert ld.bias instructions, which not only fetch data but also request a private copy of the line on a cache miss, where prior runs show this read request is often followed by a write request to the same line. For more information, please contact Jeff Collard.
Publications
"The Architecture of the HP Superdome Shared-memory Multiprocessor," Gary Gostin, Jean-Francois Collard, Kirby Collins, accepted to the International Conference on Supercomputing (ICS'05), Boston, Massachusetts · June 20-22, 2005. » Download .pdf file (838KB).
"System-wide Perfomance Monitors and their Application to the Optimization of Coherent Memory Accesses," Jean-Francois Collard, Norm Jouppi, Sami Yehia · 2005. » Download .pdf file (1.02MB).
"Load Squared: Adding Logic Close to Memory to Reduce the Latency of Indirect Loads with High Miss Ratios," Sami Yehia, Jean-Francois Collard, Olivier Temam · 2005. » Download .pdf file (102KB). |