[Gc] mark phase and prefetch

Boehm, Hans hans.boehm at hp.com
Sun Sep 13 11:11:36 PDT 2009

> From: Aliaksey Kandratsenka
> I was benchmarking this using GCBench.c (modified only for 
> larger heap usage and thus larger running time). On this 
> benchmark my code outperforms both stock prefetch strategy 
> and the code with removed prefetch. With numbers being:
> a) patch - 8.89s
> b) no-prefetch - 9.34s
> c) stock - 9.75s
> Note that it seems that 6.8 doesn't have proper support for 
> AMD64 which I use, 'cause lions share of time is spent in 
> mutex_trylock that's called from GC_malloc. Without this, the 
> speedup should be larger.
Another issue here is that modern X86 processors seem to have a fairly sophisticated hardware prefetcher.  For a  very regular benchmark like GCBench, objects seem to be layed out sufficiently regularly such that these prefetchers actually work quite well, something that's probably less true of real benchmarks, though I suspect it often remains somewhat true eve for linked data structures.  You probably also need to consider that in trying to understand performance results.


More information about the Gc mailing list