[Gc] Attaching and dettaching existing threads

David Bakin d.bakin@comcast.net
Tue, 14 Oct 2003 04:28:16 -0700


(Hans: added you to this mail because code reading turned up a problem - the
GC can break if TerminateThread is called - so maybe the docs should remind
the user that TerminateThread is a bad idea in win32 and when using the
collector.)

TerminateThread is a bad API and shouldn't be used.  See the MSDN
documentation for an idea of the trouble it can get you in. The only
recommended way to stop a thread is to have it poll an event.(*)  Yes, this
means that you're victim to programming errors that give you an infinite
loop.  But it is all you can do reliably.  (One example problem that MSDN
mentions:  TerminateThread will kill a thread even if it is holding a mutex
or critical section or other synchronization object, the object will not be
released.  This would include locks that you're not aware of because they're
taken in the C/C++ runtime or the OS, like any lock that any memory manager
you're using - including HeapAlloc - might take.  So if you TerminateThread
and you happen to be inside a malloc that thread might be holding a memory
manager lock and thus you'll deadlock every thread that subsequently
allocates or frees heap memory.)

In the particular case of the collector we see from the previous email on
this thread that the GC uses the thread notifications DLL_THREAD_ATTACH and
DLL_THREAD_DETACH to track thread handles in a table.  So, this brief
analysis is just from reading the code in 6.3alpha2: Suppose you're on
Windows CE.  In that case GC_stop_world has a loop which says "for each
thread in the table, SuspendThread(), and if it returns error (-1) wait 10ms
and try again.  But in the case that you've nuked a thread its thread handle
will be invalid, thus SuspendThread will return -1 (**), thus GC_stop_world
will infinite loop.

Alternatively, suppose you're not on Windows CE. In that case you can have a
race condition.  Suppose GC_stop_world gets called and starts suspending
threads.  It suspends a thread you're about to nuke.  But you're still
runable (because it hasn't reached your spot in the table yet) and you might
get lucky and get a context switch to you (***).  So you TerminateThread a
thread already suspended, and very shortly you yourself are suspended by
GC_stop_world.  Later GC_start_world starts resuming threads.  It resumes a
terminated thread (without first checking GetExitCodeThread), gets an error
return, and calls abort.

Or there may be other problems, you can see how fragile things are if the GC
is trying to track thread lifetimes and then you Terminate a thread out from
under it.

(Therefore, IMO there's no point in trying to fix the collector to handle
TerminateThread - TerminateThread is an abomination that can already cause
terrible problems so what's one more?)

-- Dave

(*) Actually, I just thought of another possibility that I haven't tried and
I haven't seen described anywhere.  So that probably means it doesn't work.
But if it does work it would mean you don't need to have your worker threads
poll an event, and you could safely kill any thread.  You SuspendThread the
thread you want to kill, GetThreadContext to get its machine registers, set
the IP to a routine that calls ExitThread (the only safe way to exit a
thread) or just point IP to the real ExitThread routine, SetThreadContext,
ResumeThread.

(**) That is, I suppose that is what SuspendThread will do given an invalid
handle: return -1 and put an error code in GetLastError.  But I haven't
tried it.

(***) Maybe the collector should
SetThreadPriority(THREAD_PRIORITY_TIME_CRITICAL) during the short period of
time it takes to suspend all other threads?  By the way, what happens if the
thread that triggers the collection happens to be a low-priority thread
relative to other threads in the process?  It never adjusts its own
priority - so it can suspend high-priority threads and then run slowly as
other threads in other processes are scheduled ahead of it  Maybe the
collector isn't used in any situation where multiple threads are running and
they have different priorities?


----- Original Message ----- 
From: "Julian Hall" <jules@acris.co.uk>
To: <gc@napali.hpl.hp.com>
Sent: Tuesday, October 14, 2003 2:14 AM
Subject: Re: [Gc] Attaching and dettaching existing threads


> David Bakin wrote:
>
> >On Windows a DLL does not get notified (via DLL_THREAD_ATTACH) of threads
> >existing before the DLL was loaded - except for the thread loading the
DLL
> >(which call DLLMain with parameter DLL_PROCESS_ATTACH).  It also doesn't
get
> >notified for each thread when the process (only for the thread that is
> >actually calling ExitProcess) or for threads that are TerminateThread'.
> >
> This is  actually a point I wanted to raise a while ago, but neglected
> to do so.  I have  a Win32 program that uses TerminateThread to cancel a
> background operation if it is taking too long.  This generally  causes a
> crash in the collector, presumably because it is  accessing  the
> thread's stack which no longer exists.  Is there any way of avoiding
> this problem?
>
>
> _______________________________________________
> Gc mailing list
> Gc@linux.hpl.hp.com
> http://linux.hpl.hp.com/cgi-bin/mailman/listinfo/gc