[Gc] [libatomic_ops] bug with gcc/x86_64/CAS

Patrick MARLIER patrick.marlier at unine.ch
Thu Feb 18 04:48:30 PST 2010

I think the FIX 1 is better because it uses one less register.
The cmpxchg instruction could change the rAX register so the compiler 
need to reset the rAX register after that.
In the case of FIX 1, the rAX is used for "result" and don't need 
another register for it.
Otherwise, the __sync_bool_compare_and_swap is a bit more efficient 
because in case of branch, it doesn't need the setz instruction 
(directly jnz/jz). (the __sync_ macro was introduced in GCC 4.1 if I 
trust the online documentation)

Patrick Marlier.

PS: Note that in FIX 2, you should read "2" (not "0")
(I think my previous message was not sent) I guess this is why you have 
a bad memory constraint.

**** Possible FIX 2: set RAX as earlyclobbered output ****
AO_compare_and_swap_full(volatile AO_t *addr,
AO_t old, AO_t new_val)
char result;
__asm__ __volatile__("lock; cmpxchgq %4, %0; setz %1"
: "=m"(*addr), "=q"(result) , "=&a" (old)
: "m"(*addr), "r" (new_val), "2"(old) : "memory");
return (int) result;

Ivan Maidanski wrote:
> Hi!
> "Boehm, Hans" <hans.boehm at hp.com> wrote:
>>> From: gc-bounces at napali.hpl.hp.com 
>>> [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of Ivan Maidanski
>>> Sent: Wednesday, February 17, 2010 8:45 AM
>>> To: gc at napali.hpl.hp.com
>>> Cc: Andrew Haley
>>> Subject: Re[2]: [Gc] [libatomic_ops] bug with gcc/x86_64/CAS
>>> Hi!
>>> Andrew Haley <aph at redhat.com> wrote:
>>>> On 02/17/2010 12:17 PM, Patrick MARLIER wrote:
>>>>> I think I found a bug into libatomic_ops into 
>>>>> AO_compare_and_swap_full function for gcc and x86_64 cpu.
>>>>> **** Possible FIX 2: set RAX as earlyclobbered output 
>>> **** AO_INLINE 
>>>>> int AO_compare_and_swap_full(volatile AO_t *addr, AO_t old, AO_t 
>>>>> new_val) { char result; __asm__ __volatile__("lock; 
>>> cmpxchgq %4, %0; 
>>>>> setz %1"
>>>>> : "=m"(*addr), "=q"(result) , "=&a" (old)
>>>>> : "m"(*addr), "r" (new_val), "0"(old) : "memory"); return (int) 
>>>>> result; }
>>>> I think this asm is best, but it's pretty questionable to 
>>> use an asm 
>>>> at all, given that this is a built-in gcc function.
>>> Andrew -
>>> 1. could you explain why fix 1 is not so good as fix 2;
>> If I understand correctly, fix 1 overconstrains the location of result, which might generate worse code.  It also seems to be a less accurate description of what's going on.  I would also go with fix 2.
> Unfortunately, fix 2 issues a warning: "matching constraint does not allow a register".
> So, I use the built-in function, but:
> - only for GCC v4.2+;
> - only for amd64 (because for x86 it complicates the compilation (requires -march on MinGW) or linkage (requires additional lib containing the function if -march=i486 isn't specified).
>>> 2. thanks for noting that __sync_... built-in funcs exist in GCC but:
>>> - from which GCC version they are supported?;
>>> - they are supported for all targets (where applicable) or 
>>> for Intel only?;
>>> - it would be good if someone send me a draft (at least) code 
>>> using them;
>>> - as only bug fixes are accepted for gc72, the changes would 
>>> go to the next major release (unless Hans says the opposite).
>> I think it's fine to use __sync for certain platforms on which they are known to have worked correctly for a while.  Unfortunately, I don't think they are currently a very good generic solution to the problem.  Most importantly for this discussion, they don't always do what they claim.  For example, last I checked __sync_synchronize generated a no-op on X86.  It's unclear to me whether this is fixable, since a fix may significantly slow down a lot of working code.  Some of the Itanium primitives didn't have the right ordering constraints, which is probably fixable.
>> I think the right long term fix here is to go to the C++0x atomics.  At that point we can hopefully retire libatomic_ops altogether.
>> But if a particular primitive is known to do the right thing on a platform, as in this case, I don't see a good reason not to use it.  I think that's true for __sync_bool_compare_and_swap on X86.  (Unlike some other architectures, it's hard to get that one wrong on X86, since the machine instruction includes a full fence, which behaves as expected.)
>> In this case I'm fine with either fix 2 or __sync_bool_compare_and_swap for 7.2, but only in the X86-specific code.
>> Hans
> I've also removed a check for AO_USE_PENTIUM4_INSTRS for gcc/msvc amd64 since AFAIK all x86_64/amd64 chips support SSE2.
> Bye.

More information about the Gc mailing list