Re[4]: [Gc] [libatomic_ops] bug with gcc/x86_64/CAS

Ivan Maidanski ivmai at mail.ru
Thu Feb 18 02:31:00 PST 2010


Hi!
"Boehm, Hans" <hans.boehm at hp.com> wrote:
> > From: gc-bounces at napali.hpl.hp.com 
> > [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of Ivan Maidanski
> > Sent: Wednesday, February 17, 2010 8:45 AM
> > To: gc at napali.hpl.hp.com
> > Cc: Andrew Haley
> > Subject: Re[2]: [Gc] [libatomic_ops] bug with gcc/x86_64/CAS
> > 
> > Hi!
> > Andrew Haley <aph at redhat.com> wrote:
> > > On 02/17/2010 12:17 PM, Patrick MARLIER wrote:
> > > 
> > > > I think I found a bug into libatomic_ops into 
> > > > AO_compare_and_swap_full function for gcc and x86_64 cpu.
> > > > 
> > > > **** Possible FIX 2: set RAX as earlyclobbered output 
> > **** AO_INLINE 
> > > > int AO_compare_and_swap_full(volatile AO_t *addr, AO_t old, AO_t 
> > > > new_val) { char result; __asm__ __volatile__("lock; 
> > cmpxchgq %4, %0; 
> > > > setz %1"
> > > > : "=m"(*addr), "=q"(result) , "=&a" (old)
> > > > : "m"(*addr), "r" (new_val), "0"(old) : "memory"); return (int) 
> > > > result; }
> > > 
> > > I think this asm is best, but it's pretty questionable to 
> > use an asm 
> > > at all, given that this is a built-in gcc function.
> > 
> > Andrew -
> > 1. could you explain why fix 1 is not so good as fix 2;
> If I understand correctly, fix 1 overconstrains the location of result, which might generate worse code.  It also seems to be a less accurate description of what's going on.  I would also go with fix 2.

Unfortunately, fix 2 issues a warning: "matching constraint does not allow a register".

So, I use the built-in function, but:
- only for GCC v4.2+;
- only for amd64 (because for x86 it complicates the compilation (requires -march on MinGW) or linkage (requires additional lib containing the function if -march=i486 isn't specified).

> > 2. thanks for noting that __sync_... built-in funcs exist in GCC but:
> > - from which GCC version they are supported?;
> > - they are supported for all targets (where applicable) or 
> > for Intel only?;
> > - it would be good if someone send me a draft (at least) code 
> > using them;
> > - as only bug fixes are accepted for gc72, the changes would 
> > go to the next major release (unless Hans says the opposite).
> I think it's fine to use __sync for certain platforms on which they are known to have worked correctly for a while.  Unfortunately, I don't think they are currently a very good generic solution to the problem.  Most importantly for this discussion, they don't always do what they claim.  For example, last I checked __sync_synchronize generated a no-op on X86.  It's unclear to me whether this is fixable, since a fix may significantly slow down a lot of working code.  Some of the Itanium primitives didn't have the right ordering constraints, which is probably fixable.
> 
> I think the right long term fix here is to go to the C++0x atomics.  At that point we can hopefully retire libatomic_ops altogether.
> 
> But if a particular primitive is known to do the right thing on a platform, as in this case, I don't see a good reason not to use it.  I think that's true for __sync_bool_compare_and_swap on X86.  (Unlike some other architectures, it's hard to get that one wrong on X86, since the machine instruction includes a full fence, which behaves as expected.)
> 
> In this case I'm fine with either fix 2 or __sync_bool_compare_and_swap for 7.2, but only in the X86-specific code.
> 
> Hans

I've also removed a check for AO_USE_PENTIUM4_INSTRS for gcc/msvc amd64 since AFAIK all x86_64/amd64 chips support SSE2.

Bye.


More information about the Gc mailing list