[Gc] [win32] Isn't this a race condition?

Dave Korn dave.korn.cygwin at gmail.com
Wed Apr 10 17:58:56 PDT 2013

    Hi list,

  In the (admittedly somewhat old) version of boehm-gc in the GCC source tree,
I see the following failure from thread_leak_test about 50% of the time (out
of 200 testruns):

> $ ./.libs/thread_leak_test.exe
> SuspendThread failed
> Aborted (core dumped)

  It's coming from GC_suspend in win32_threads.c, relevant snippets shown below:

/* Suspend the given thread, if it's still active.      */
STATIC void GC_suspend(GC_thread t)
    [ ... snip ... ]
    DWORD exitCode;
    [ ... snip ... ]
    if (GetExitCodeThread(t -> handle, &exitCode) &&
        exitCode != STILL_ACTIVE) {
#     ifdef GC_PTHREADS
        t -> stack_base = 0; /* prevent stack from being pushed */
#     else
        /* this breaks pthread_join on Cygwin, which is guaranteed to  */
        /* only see user pthreads                                      */
#     endif
    [ ... snip ... ]
# else /* !MSWINCE */
    if (SuspendThread(t -> handle) == (DWORD)-1)
      ABORT("SuspendThread failed");
# endif /* !MSWINCE */
  t -> suspended = (unsigned char)TRUE;
    [ ... snip ... ]

  Judging by reading the code, it seemed likely to me that there was a race
condition where a thread could still be active at the time GetExitCodeThread
was called, but then have exited by the time SuspendThread was called,
resulting in that call failing.  I modified that clause to read:

	if (SuspendThread(thread_table[i].handle) == (DWORD)-1) {
	  if (GetExitCodeThread(thread_table[i].handle,&exitCode) &&
            exitCode != STILL_ACTIVE) {
	    ABORT("Race condition");
	  ABORT("SuspendThread failed");

i.e., it now checks again if the thread has exited when SuspendThread fails,
and now I occasionally see the "Race condition" message - although not often
(about 5% of tests); most often it still just shows the "SuspendThread failed"
message.  I suspect (but cannot prove) that during the thread shutdown code in
the windows kernel, there is a stage at which it stops being suspendable
before the final exit code is stored.

  In the HEAD version of bdwgc, analogous code still exists in
win32_threads.c/GC_suspend(), but I don't see it fail (in 200 tests).

  Does anyone know why it's no longer a problem?  I note that an awful lot has
changed in the rest of the test infrastructure, perhaps there's some more
locking somewhere (parallel marking?) that prevents the problem from arising.


More information about the Gc mailing list