[Gc] [PATCH] Race condition when restarting threads

Ben Maurer bmaurer at ximian.com
Sun Jul 3 08:29:19 PDT 2005


In a Mono bug report, we noticed a very rare race in the GC when
restarting the world. GC_restart_handler states:

    /* Let the GC_suspend_handler() know that we got a SIG_THR_RESTART. */
    /* The lookup here is safe, since I'm doing this on behalf  */
    /* of a thread which holds the allocation lock in order	*/
    /* to stop the world.  Thus concurrent modification of the	*/
    /* data structure is impossible.				*/

However, this comment is not always true. When starting the world, the
thread that does the restarting does *not* wait for all threads to get
past the point where they need the structures used by the lookup for it
to release the GC_lock.

So the sequence of events looked something like:

      * T1 signals T2 to restart the world
      * T1 releases the GC_lock
      * T3 is a newborn thread and adds itself to the table
      * T2 gets the signal and sees a corrupt table because T3 is
        concurrently modifying it.

What would end up happening when we experienced the race was either a
deadlock or a SIGSEGV.

The race was extremely rare. It took 1-2 hours to reproduce on an SMP
machine. With the attached patch, it has not segfaulted or hung for 21

-- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc.patch
Type: text/x-patch
Size: 1309 bytes
Desc: not available
Url : https://napali.hpl.hp.com/pipermail/gc/attachments/20050703/e09d1119/gc.bin

More information about the Gc mailing list