Re: [Gc]: Re: Win32: Deadlock if DEBUG_THREADS
ivmai at mail.ru
Thu Nov 5 22:26:42 PST 2009
"Boehm, Hans" <hans.boehm at hp.com> wrote:
> > From: Ivan Maidanski
> > Sent: Thursday, November 05, 2009 12:28 PM
> > To: gc at napali.hpl.hp.com
> > Subject: [Gc] Re: Win32: Deadlock if DEBUG_THREADS
> > Hi!
> > An hour ago I wrote:
> > > Today I ran into a deadlock on Win32 with DEBUG_THREADS defined.
> > > The simplest config to reproduce is (for MinGW, for VC++ is
> > similar):
> > > gcc -DALL_INTERIOR_POINTERS -DGC_THREADS -DDEBUG_THREADS -g
> > > [-DGC_ASSERTIONS] -I include -I libatomic_ops/src tests/test.c *.c
> > > -luser32
> > It's easier to reproduce it with -DPARALLEL_MARK (with or w/o
> > -DDONT_USE_SIGNALANDWAIT).
> > >
> > > The deadlock occurs at least in 1/5 runs. Not observed if
> > set GC_DISABLE_INCREMENTAL=1. Doesn't depend on
> > GC_USE_GETWRITEWATCH value (0/1).
> > >
> > > At present, I've failed to found out the place of the
> > problem (I observe 3 threads waiting for alloc lock in
> > GC_malloc, 2 threads are waiting for some system resource in
> > WriteFile(),...
> > Not in WriteFile(), in EnterCriticalSection() in GC_write().
> > > ... 1 thread is waiting for a system resource in
> > CreateThread() and the backtrace for the main thread is not visible).
> > The main thread is in GC_write acquiring GC_write_cs too.
> Is one of the threads waiting for GC_write_cs already holding it, e.g. is it running a collection?
I've got a 'nicer' backtrace - threads:
1 (main) - blocked in EnterCS in GC_write() called from GC_stopped_mark (the world is stopped) but GC_write_cs is owned by any thread - this is strange, the rest looks ok;
2. a helper thread - waiting for LOCK in GC_notify_all_marker;
3. in GC_generic_malloc() called from GC_malloc() called from test.c;
4. blocked in EnterCS in GC_write() called from GC_CreateThread (after the thread is registered);
5. blocked in CreateThread called from GC_CreateThread.
> > > BTW. What's the purpose of GC_write_cs (beyond atomicity of
> > in-line writes)?
> I'm not sure it should be 100% necessary. I did find some old message from Romano Paolo Tenca that it helps with debugging in some configurations, perhaps for those atomicity reasons.
Ok. Let it be... The locking code is trivial here. (I've added the comments and an assertion for GC_write not called from GC_stop_world indirectly.)
> > And why do we need to acquire it in GC_stop_world?
> So that we don't stop a thread that's holding it, and then try to acquire it ourselves, which would of course lead to a deadlock.
Yes, I've understood this already.
> > (Removing GC_write_cs doesn't solve anything...)
> Which seems odd, since it did seem to be involved in the deadlock.
> Clearly the trick here is to figure out who is holding which lock. I think that for GC_write_cs that should be easily determinale by what's on the stack. It's only acquired in a couple of places which, based on code inspection, always release it again.
It seems none is holding GC_write_cs but the owner of the alloc lock is blocked in EnterCS().
> If this happens with assertions enabled, GC_lock_holder should tell you who holds the allocation lock, I think.
The situation reminds me that you'd fixed by adding UNPROTECT_THREAD to win32_threads.c - does removing UNPROTECT_THREAD back could result in a deadlock with GWW_VDB? Is there something more to explicitly unprotect in win32_threads?
More information about the Gc