[Gc] allocator and false sharing
Alexander Herz
alexander.herz at mytum.de
Mon Aug 20 08:23:24 PDT 2012
Ok, never mind.
The thread library I'm using does some lazy initialization which
apparently induced the effect.
If I force init it before doing the measurements then everything is fine.
Sorry,
Alex
On 20.08.2012 14:11, Alexander Herz wrote:
> I just verified:
>
> GC_malloc inside thread_local_alloc.c is used and the results are
> still as bad.
>
> Alex
>
> On 20.08.2012 13:43, Bruce Hoult wrote:
>> I'd hope and expect that using the thread-local free list facility
>> would prevent that.
>>
>> Did you build with -DTHREAD_LOCAL_ALLOC and follow the other
>> instructions at:
>>
>> http://www.hpl.hp.com/personal/Hans_Boehm/gc/scale.html
>>
>> On Mon, Aug 20, 2012 at 11:34 PM, Alexander Herz
>> <alexander.herz at mytum.de> wrote:
>>> Hi,
>>>
>>> While benchmarking all kinds of parallel code I come across another
>>> serious
>>> scalability problem with the boehm allocator:
>>>
>>> executing this in 2 parallel threads:
>>>
>>> struct counter : virtual gc
>>> {
>>> inline counter()
>>> {
>>> i=0;
>>> }
>>>
>>> tbb::atomic<int>;
>>>
>>> inline void up()
>>> {
>>> i.fetch_and_add(1); //this boils down to a locked add
>>> }
>>> };
>>>
>>> int loop(long v, counter* c)
>>> {
>>> LOOP_LABEL:
>>> if (v <= 0)
>>> {
>>> return 0;
>>> }
>>> else
>>> {
>>> c->up();
>>> }
>>> v=v - 1;
>>> goto LOOP_LABEL;
>>> }
>>>
>>> //following thread is exeuted 2 times (in parallel)
>>> thread
>>> {
>>> counter* c=new (GC_MALLOC(sizeof(counter))) counter();
>>> loop(5000000,c);
>>> }
>>>
>>> then I execute the same code sequentially and look at the speedup.
>>>
>>> I get a speedup of 0.6 (parallel version is slower than sequential).
>>> Looking at some profiling data, I get a lot of false sharing
>>> (different data
>>> in the same cache line accessed by
>>> different threads).
>>>
>>> If I replace GC_MALLOC(sizeof(counter)) by GC_MALLOC(64) (cache line
>>> size of
>>> my cpu) then I get the expected speed up of almost 2.0.
>>>
>>> So apparently the boehm allocator does not take care to avoid false
>>> sharing
>>> (unlike tbb::scalable_malloc).
>>>
>>> Again it would be nice to be able to substitute a different allocator.
>>>
>>> Regards,
>>> Alex
>>>
>>>
>>>
>>> _______________________________________________
>>> Gc mailing list
>>> Gc at linux.hpl.hp.com
>>> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
>>>
>>> --
>>> This message has been scanned for viruses and
>>> dangerous content by MailScanner, and is
>>> believed to be clean.
>>>
>> _______________________________________________
>> Gc mailing list
>> Gc at linux.hpl.hp.com
>> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
>
> _______________________________________________
> Gc mailing list
> Gc at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
More information about the Gc
mailing list