[Gc] Re: test_stack on powerpc (power7)
will_schmidt at vnet.ibm.com
Tue Jan 28 14:50:04 PST 2014
On Tue, 2014-01-28 at 16:43 -0500, Lennart Sorensen wrote:
> On Tue, Jan 28, 2014 at 03:22:35PM -0600, Will Schmidt wrote:
> > Hi All,
> > I've been looking at the test_stack test case failure as seen on
> > ppc64 / power7 based systems. I don't have a fix, but believe I
> > understand where the problem is occurring.
> > The simplest case I've been able to duplicate is with three threads.
> > As I've added debug to the code, the problem gets harder to nail down
> > precisely, but this is what seems to be happening.
> > In the failure scenario:
> > The list appears OK during run_one_test() before and after
> > AO_stack_pop() is called. The thread is holding two entries in the t[i]
> > array, and the list still looks OK. The list is damaged after the
> > AO_stack_push() call is made.
> > Within AO_stack_push(),
> > [src/atomic_ops_stack.c:AO_stack_pop_explicit_aux_require()]
> > The malfunction seems to be triggered while one of the threads is
> > between the "first=AO_load(list);" and the
> > "AO_compare_and_swap_release(list,first,next);". Either one or both of
> > the other threads will have removed and replaced multiple elements, such
> > that the compare and swap of list,first,next will pass the check, but
> > the list entries, particularly the next pointer at first, has changed.
> > This is referenced in the comment at that location:
> > /* Thus its next link cannot have changed out from under us, and we */
> > /* removed exactly one entry and preserved the rest of the list. */
> > /* Note that it is quite possible that an additional entry was */
> > /* inserted and removed while we were running; this is OK since the */
> > /* part of the list following first must have remained unchanged, and */
> > /* first must again have been at the head of the list when the */
> > /* compare_and_swap succeeded. */
> > which seem to be untrue in this case.
> > The powerpc AO_* functions seem to be OK. We'd prefer the gcc atomic
> > builtins be used (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html),
> > (thats what they are there for), but I don't think that change
> > would help in this case.
> > My recommendation is that the test be rewritten to handle the case where
> > first->next has changed underneath the current thread.
> > A shorter term fix would probably be to disable the test_stack test for
> > power7 and newer processors, until it can be fixed.
> So does this mean the stack test is wrong? I was worried that the actual
> AO_ functions were wrong on powerpc.
Yes, Thats my feeling right now, that the test is wrong. To clarify,
it's obviously not all over wrong, but there is definitely a corner
condition that we're able to hit on power7 and newer that we don't
(yet?) see on other platforms.
More information about the Gc