[Gc] GC_get_bytes_since_gc locks

Boehm, Hans hans.boehm at hp.com
Tue Aug 23 18:03:28 PDT 2011

Indeed, we could implement this with libatomic_ops now, and atomic_long later.  And I could perhaps be talked into that.  The problem is more code in the GC would have to change and, since updates to the counter here are time critical, there is some chance of slowing down the main part of the collector for a really niche feature.  Since the updates are all done with the lock held, they can be done with AO_load + AO_store, which probably doesn't really cost anything (except some ugliness in converting back and forth).  The C1x atomic_long does provide some slightly stronger guarantees, even with the weakest memory ordering guarantees, which may cost a few cycles on a few machines.

I really have two philosophical issues with ignoring the data races.  (The GC still has a few anyway, but I really don't want to add new ones.)

1) The argument made in the paper: It may eventually break with a new compiler, and you'll never figure out what happened.  The failures are admittedly unlikely with current compilers on platforms with suitable alignment, but we're working on fixing that :-)  Deadlocks at least exhibit nice fail-stop behavior.

2) In an ideal world, I think we should be relying heavily on data race detectors.  I think C++11 and C1x get us closer to making that practical, by allowing atomics to be used instead of unannotated racing accesses.  (The collector has one or two data races that are hard to recast properly as atomic operations.  Linux seqlocks have a similar issue, as does your code below.  We're thinking about those.)  Putting intentional races into your code makes it much harder to find the ones that are already breaking your code now.

The code below won't work reliably as C code (the compiler may recognize the two loads of *hiPtr as a common subexpression, or the hardware on e.g. ARM or PowerPC can reorder the loads).  The assembly version requires fences on some architectures.  But we could get around that here by using libatomic_ops.


> -----Original Message-----
> From: bruce.hoult at gmail.com [mailto:bruce.hoult at gmail.com] On Behalf Of
> Bruce Hoult
> Sent: Tuesday, August 23, 2011 5:37 PM
> To: Boehm, Hans
> Cc: Juan Jose Garcia-Ripoll; gc at linux.hpl.hp.com
> Subject: Re: [Gc] GC_get_bytes_since_gc locks
> On Wed, Aug 24, 2011 at 6:29 AM, Boehm, Hans <hans.boehm at hp.com> wrote:
> > In general, yes, it is necessary.  See for example my HotPar 2011
> paper:
> >  https://www.usenix.org/events/hotpar11/tech/final_files/Boehm.pdf
> While there can be data races if the counter requires multiple memory
> accesses to read, a lock is a very heavy-weight way to avoid them!
> This problem is common when reading things such a real time clock
> registers and is usually solved with a construct such as:
> // depends on big/little endian
> #define HIOFF 1
> #define LOOFF 0
> WORD *loPtr = (&counter)+LOOFF;
> WORD *hiPtr = (&counter)+HIOFF;
> WORD hi1, hi2, lo;
> do {
>   hi1 = *hiPtr;
>   lo = *loPtr;
>   hi2 = *hiPtr;
> } while (hi1 != hi2);
> return (hi1<<WORDSIZE) | lo;
> If the variable is in fact a counter then the loop is very unlikely to
> run more than once.
> This of course can be ruined by the kind of compiler optimizations
> described in your paper, in which case it can be implemented in
> assembler and put in the atomic ops library.
> Those super-paranoid about forward progress can switch to a locked
> version if it loops too many times.
> This won't deal with the kind of problem you describe where the
> variable is not aligned to at least a WORD boundary and the CPU
> silently does multiple reads for a WORD. I don't believe this is a
> practical problem on any reasonable compiler if the declaration of the
> counter variable is under your control and you refrain from bitfield
> notation or other packing pragmas.  In any case it can be easily
> detected and a compile or runtime error issued to prevent incorrect
> results.

More information about the Gc mailing list