[Gc] [PATCH] Race condition when restarting threads
bmaurer at ximian.com
Sun Jul 3 08:29:19 PDT 2005
In a Mono bug report, we noticed a very rare race in the GC when
restarting the world. GC_restart_handler states:
/* Let the GC_suspend_handler() know that we got a SIG_THR_RESTART. */
/* The lookup here is safe, since I'm doing this on behalf */
/* of a thread which holds the allocation lock in order */
/* to stop the world. Thus concurrent modification of the */
/* data structure is impossible. */
However, this comment is not always true. When starting the world, the
thread that does the restarting does *not* wait for all threads to get
past the point where they need the structures used by the lookup for it
to release the GC_lock.
So the sequence of events looked something like:
* T1 signals T2 to restart the world
* T1 releases the GC_lock
* T3 is a newborn thread and adds itself to the table
* T2 gets the signal and sees a corrupt table because T3 is
concurrently modifying it.
What would end up happening when we experienced the race was either a
deadlock or a SIGSEGV.
The race was extremely rare. It took 1-2 hours to reproduce on an SMP
machine. With the attached patch, it has not segfaulted or hung for 21
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 1309 bytes
Desc: not available
Url : http://napali.hpl.hp.com/pipermail/gc/attachments/20050703/e09d1119/gc.bin
More information about the Gc