[Gc] [PATCH] Race condition when restarting threads

Boehm, Hans hans.boehm at hp.com
Tue Jul 12 11:13:44 PDT 2005


I think I generally also prefer your solution.

But it seems to me that it has a very low probability memory ordering
issue.

If I see GC_world_is_stopped set because I'm trying to stop the
world for the next GC, but I still see the old value of GC_stop_count
(due to compiler or hardware memory reordering), I think it deadlocks.
My ugly version will just pause for 25msecs and then recover.

I think I'll go with your version for GC7, where I have a cleaner
way of enforcing the memory ordering, and keep my version for 6.6.
(This assumes we find no other problems.  This stuff is much too
subtle.)

Hans

> -----Original Message-----
> From: Ben Maurer [mailto:bmaurer at ximian.com] 
> Sent: Tuesday, July 12, 2005 8:09 AM
> To: Boehm, Hans
> Cc: mono-devel-list at lists.ximian.com; gc at napali.hpl.hp.com
> Subject: RE: [Gc] [PATCH] Race condition when restarting threads
> 
> 
> On Mon, 2005-07-11 at 14:15 -0700, Boehm, Hans wrote:
> > I've attached a different patch, which I think should solve the 
> > problem without additional synchronization and context switches, at 
> > least in the vast majority of cases.  (It should solve the 
> problem in 
> > all cases.  Additional context switches will be needed only if the 
> > sigsuspend wakes up early, I claim.)
> > 
> > Please let me know if you have any problems with this, or if this 
> > doesn't look right to you.  I tested only superficially.
> 
> I'll try to test this as soon as I get a chance.
> 
> One question, which is probably a case of me not having any 
> clue about the code:
> 
> > -    do {
> > -           me->stop_info.signal = 0;
> > -           sigsuspend(&suspend_handler_mask);        /* 
> Wait for signal */
> > -    } while (me->stop_info.signal != SIG_THR_RESTART);
> > +    /* We do not continue until we receive a SIG_THR_RESTART,  */
> > +    /* but we do not take that as authoritative.  (We may be   */
> > +    /* accidentally restarted by one of the user signals we    */
> > +    /* don't block.)  After we receive the signal, we use a    */
> > +    /* primitive and expensive mechanism to wait until it's    */
> > +    /* really safe to proceed.  Under normal circumstances,    */
> > +    /* this code should not be executed.                       */
> > +    sigsuspend(&suspend_handler_mask);        /* Wait for signal */
> > +    while (GC_world_is_stopped && GC_stop_count == my_stop_count) {
> > +        GC_brief_async_signal_safe_sleep();
> > +#       if DEBUG_THREADS
> > +         GC_err_printf0("Sleeping in signal handler");
> > +#       endif
> > +    }
> 
> Why can't you just say
> 
> do {
> 	sigsuspend (&suspend_handler_mask);
> } while (GC_world_is_stopped && GC_stop_count == my_stop_count);
> 
> -- Ben
> 
> 
> 



More information about the Gc mailing list