[Gc] Attaching and dettaching existing threads

Boehm, Hans hans_boehm@hp.com
Tue, 14 Oct 2003 13:01:09 -0700


Thanks to everyone for pointing this out.  I added a warning about
TerminateThread() to README.win32.  I agree that there is no point in
trying to fix this.  I wasn't previously aware of the existence of this
call.

If one wanted to "fix" this, probably the only solution would be to
intercept TerminateThread and have it clean up the GC thread data
structure, as well as acquiring the GC lock, to avoid various races
with an in-progress GC.

Hans

> -----Original Message-----
> From: David Bakin [mailto:d.bakin@comcast.net]
> Sent: Tuesday, October 14, 2003 4:28 AM
> To: Julian Hall; Boehm, Hans
> Cc: gc@napali.hpl.hp.com
> Subject: Re: [Gc] Attaching and dettaching existing threads
> 
> 
> (Hans: added you to this mail because code reading turned up 
> a problem - the
> GC can break if TerminateThread is called - so maybe the docs 
> should remind
> the user that TerminateThread is a bad idea in win32 and when 
> using the
> collector.)
> 
> TerminateThread is a bad API and shouldn't be used.  See the MSDN
> documentation for an idea of the trouble it can get you in. The only
> recommended way to stop a thread is to have it poll an 
> event.(*)  Yes, this
> means that you're victim to programming errors that give you 
> an infinite
> loop.  But it is all you can do reliably.  (One example 
> problem that MSDN
> mentions:  TerminateThread will kill a thread even if it is 
> holding a mutex
> or critical section or other synchronization object, the 
> object will not be
> released.  This would include locks that you're not aware of 
> because they're
> taken in the C/C++ runtime or the OS, like any lock that any 
> memory manager
> you're using - including HeapAlloc - might take.  So if you 
> TerminateThread
> and you happen to be inside a malloc that thread might be 
> holding a memory
> manager lock and thus you'll deadlock every thread that subsequently
> allocates or frees heap memory.)
> 
> In the particular case of the collector we see from the 
> previous email on
> this thread that the GC uses the thread notifications 
> DLL_THREAD_ATTACH and
> DLL_THREAD_DETACH to track thread handles in a table.  So, this brief
> analysis is just from reading the code in 6.3alpha2: Suppose you're on
> Windows CE.  In that case GC_stop_world has a loop which says 
> "for each
> thread in the table, SuspendThread(), and if it returns error 
> (-1) wait 10ms
> and try again.  But in the case that you've nuked a thread 
> its thread handle
> will be invalid, thus SuspendThread will return -1 (**), thus 
> GC_stop_world
> will infinite loop.
> 
> Alternatively, suppose you're not on Windows CE. In that case 
> you can have a
> race condition.  Suppose GC_stop_world gets called and starts 
> suspending
> threads.  It suspends a thread you're about to nuke.  But you're still
> runable (because it hasn't reached your spot in the table 
> yet) and you might
> get lucky and get a context switch to you (***).  So you 
> TerminateThread a
> thread already suspended, and very shortly you yourself are 
> suspended by
> GC_stop_world.  Later GC_start_world starts resuming threads. 
>  It resumes a
> terminated thread (without first checking GetExitCodeThread), 
> gets an error
> return, and calls abort.
> 
> Or there may be other problems, you can see how fragile 
> things are if the GC
> is trying to track thread lifetimes and then you Terminate a 
> thread out from
> under it.
> 
> (Therefore, IMO there's no point in trying to fix the 
> collector to handle
> TerminateThread - TerminateThread is an abomination that can 
> already cause
> terrible problems so what's one more?)
> 
> -- Dave
> 
> (*) Actually, I just thought of another possibility that I 
> haven't tried and
> I haven't seen described anywhere.  So that probably means it 
> doesn't work.
> But if it does work it would mean you don't need to have your 
> worker threads
> poll an event, and you could safely kill any thread.  You 
> SuspendThread the
> thread you want to kill, GetThreadContext to get its machine 
> registers, set
> the IP to a routine that calls ExitThread (the only safe way to exit a
> thread) or just point IP to the real ExitThread routine, 
> SetThreadContext,
> ResumeThread.
> 
> (**) That is, I suppose that is what SuspendThread will do 
> given an invalid
> handle: return -1 and put an error code in GetLastError.  But 
> I haven't
> tried it.
> 
> (***) Maybe the collector should
> SetThreadPriority(THREAD_PRIORITY_TIME_CRITICAL) during the 
> short period of
> time it takes to suspend all other threads?  By the way, what 
> happens if the
> thread that triggers the collection happens to be a 
> low-priority thread
> relative to other threads in the process?  It never adjusts its own
> priority - so it can suspend high-priority threads and then 
> run slowly as
> other threads in other processes are scheduled ahead of it  Maybe the
> collector isn't used in any situation where multiple threads 
> are running and
> they have different priorities?
> 
> 
> ----- Original Message ----- 
> From: "Julian Hall" <jules@acris.co.uk>
> To: <gc@napali.hpl.hp.com>
> Sent: Tuesday, October 14, 2003 2:14 AM
> Subject: Re: [Gc] Attaching and dettaching existing threads
> 
> 
> > David Bakin wrote:
> >
> > >On Windows a DLL does not get notified (via 
> DLL_THREAD_ATTACH) of threads
> > >existing before the DLL was loaded - except for the thread 
> loading the
> DLL
> > >(which call DLLMain with parameter DLL_PROCESS_ATTACH).  
> It also doesn't
> get
> > >notified for each thread when the process (only for the 
> thread that is
> > >actually calling ExitProcess) or for threads that are 
> TerminateThread'.
> > >
> > This is  actually a point I wanted to raise a while ago, 
> but neglected
> > to do so.  I have  a Win32 program that uses 
> TerminateThread to cancel a
> > background operation if it is taking too long.  This 
> generally  causes a
> > crash in the collector, presumably because it is  accessing  the
> > thread's stack which no longer exists.  Is there any way of avoiding
> > this problem?
> >
> >
> > _______________________________________________
> > Gc mailing list
> > Gc@linux.hpl.hp.com
> > http://linux.hpl.hp.com/cgi-bin/mailman/listinfo/gc
>