[Gc] allocator and false sharing

Alexander Herz alexander.herz at mytum.de
Mon Aug 20 04:34:15 PDT 2012


Hi,

While benchmarking all kinds of parallel code I come across another 
serious scalability problem with the boehm allocator:

executing this in 2 parallel threads:

struct counter : virtual gc
{
     inline counter()
     {
         i=0;
     }

     tbb::atomic<int>;

     inline void up()
     {
         i.fetch_and_add(1); //this boils down to a locked add
     }
};

int loop(long v, counter* c)
{
     LOOP_LABEL:
     if (v <= 0)
     {
         return 0;
     }
     else
     {
         c->up();
     }
     v=v - 1;
     goto LOOP_LABEL;
}

//following thread is exeuted 2 times (in parallel)
thread
{
     counter* c=new (GC_MALLOC(sizeof(counter))) counter();
     loop(5000000,c);
}

then I execute the same code sequentially and look at the speedup.

I get a speedup of 0.6 (parallel version is slower than sequential).
Looking at some profiling data, I get a lot of false sharing (different 
data in the same cache line accessed by
different threads).

If I replace GC_MALLOC(sizeof(counter)) by GC_MALLOC(64) (cache line 
size of my cpu) then I get the expected speed up of almost 2.0.

So apparently the boehm allocator does not take care to avoid false 
sharing (unlike tbb::scalable_malloc).

Again it would be nice to be able to substitute a different allocator.

Regards,
Alex





More information about the Gc mailing list