[Gc] allocator and false sharing
alexander.herz at mytum.de
Mon Aug 20 04:34:15 PDT 2012
While benchmarking all kinds of parallel code I come across another
serious scalability problem with the boehm allocator:
executing this in 2 parallel threads:
struct counter : virtual gc
inline void up()
i.fetch_and_add(1); //this boils down to a locked add
int loop(long v, counter* c)
if (v <= 0)
v=v - 1;
//following thread is exeuted 2 times (in parallel)
counter* c=new (GC_MALLOC(sizeof(counter))) counter();
then I execute the same code sequentially and look at the speedup.
I get a speedup of 0.6 (parallel version is slower than sequential).
Looking at some profiling data, I get a lot of false sharing (different
data in the same cache line accessed by
If I replace GC_MALLOC(sizeof(counter)) by GC_MALLOC(64) (cache line
size of my cpu) then I get the expected speed up of almost 2.0.
So apparently the boehm allocator does not take care to avoid false
sharing (unlike tbb::scalable_malloc).
Again it would be nice to be able to substitute a different allocator.
More information about the Gc