[Gc] Re: test_stack on powerpc (power7)

Will Schmidt will_schmidt at vnet.ibm.com
Tue Jan 28 14:50:04 PST 2014


On Tue, 2014-01-28 at 16:43 -0500, Lennart Sorensen wrote:
> On Tue, Jan 28, 2014 at 03:22:35PM -0600, Will Schmidt wrote:
> > Hi All, 
> >   I've been looking at the test_stack test case failure as seen on
> > ppc64 / power7 based systems.    I don't have a fix, but believe I
> > understand where the problem is occurring.  
> > 
> > The simplest case I've been able to duplicate is with three threads.
> > As I've added debug to the code, the problem gets harder to nail down
> > precisely, but this is what seems to be happening.
> > 
> > In the failure scenario:
> >   The list appears OK during run_one_test() before and after
> > AO_stack_pop() is called.  The thread is holding two entries in the t[i]
> > array, and the list still looks OK. The list is damaged after the
> > AO_stack_push() call is made.
> > 
> > Within AO_stack_push(),
> > [src/atomic_ops_stack.c:AO_stack_pop_explicit_aux_require()]
> > The malfunction seems to be triggered while one of the threads is
> > between the "first=AO_load(list);" and the
> > "AO_compare_and_swap_release(list,first,next);".  Either one or both of
> > the other threads will have removed and replaced multiple elements, such
> > that the compare and swap of list,first,next will pass the check, but
> > the list entries, particularly the next pointer at first, has changed. 
> > 
> > This is referenced in the comment at that location:
> >   /* Thus its next link cannot have changed out from under us, and we   */
> >   /* removed exactly one entry and preserved the rest of the list.      */
> >   /* Note that it is quite possible that an additional entry was        */
> >   /* inserted and removed while we were running; this is OK since the   */
> >   /* part of the list following first must have remained unchanged, and */
> >   /* first must again have been at the head of the list when the        */
> >   /* compare_and_swap succeeded.                                        */
> > 
> > which seem to be untrue in this case.
> > 
> > 
> > The powerpc AO_* functions seem to be OK.  We'd prefer the gcc atomic
> > builtins be used (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html),
> > (thats what they are there for), but I don't think that change 
> > would help in this case.
> > 
> > My recommendation is that the test be rewritten to handle the case where
> > first->next has changed underneath the current thread.
> >  A shorter term fix would probably be to disable the test_stack test for
> > power7 and newer processors, until it can be fixed. 
> 
> So does this mean the stack test is wrong?  I was worried that the actual
> AO_ functions were wrong on powerpc.

Yes,  Thats my feeling right now, that the test is wrong.  To clarify,
it's obviously not all over wrong, but there is definitely a corner
condition that we're able to hit on power7 and newer that we don't
(yet?) see on other platforms. 




More information about the Gc mailing list