[Gc] allocator and false sharing
Alexander Herz
alexander.herz at mytum.de
Mon Aug 20 04:34:15 PDT 2012
Hi,
While benchmarking all kinds of parallel code I come across another
serious scalability problem with the boehm allocator:
executing this in 2 parallel threads:
struct counter : virtual gc
{
inline counter()
{
i=0;
}
tbb::atomic<int>;
inline void up()
{
i.fetch_and_add(1); //this boils down to a locked add
}
};
int loop(long v, counter* c)
{
LOOP_LABEL:
if (v <= 0)
{
return 0;
}
else
{
c->up();
}
v=v - 1;
goto LOOP_LABEL;
}
//following thread is exeuted 2 times (in parallel)
thread
{
counter* c=new (GC_MALLOC(sizeof(counter))) counter();
loop(5000000,c);
}
then I execute the same code sequentially and look at the speedup.
I get a speedup of 0.6 (parallel version is slower than sequential).
Looking at some profiling data, I get a lot of false sharing (different
data in the same cache line accessed by
different threads).
If I replace GC_MALLOC(sizeof(counter)) by GC_MALLOC(64) (cache line
size of my cpu) then I get the expected speed up of almost 2.0.
So apparently the boehm allocator does not take care to avoid false
sharing (unlike tbb::scalable_malloc).
Again it would be nice to be able to substitute a different allocator.
Regards,
Alex
More information about the Gc
mailing list