[Gc] FW: GC: Time for GC final release? (draft patch for
Hans.Boehm at hp.com
Fri Aug 6 09:13:47 PDT 2010
On Wed, 4 Aug 2010, Ivan Maidanski wrote:
> Hello, Hans!
> Could you review my next version? (I even haven't compiled it - no
> enough time these days).
I know the problem ...
It generally looks very good to me, based on inspection. I did not
yet test. Thank you.
> Also, the Q: is it ok not to check for me!=0 in pthread_cancel()?
"Me" is not a good k name for the variable, since it refers to the
thread being cancelled, not the thread cancelling it. I'd
suggest "target". I think you do need to check for null, and just
return ESRCH, or at least not access the flags field, if it is null.
pthread_exit looks fine as is.
> Q2: What are the conds of setting GC_INTERCEPT_PTHREAD_EXIT in gcconfig.h?
We probably shouldn't bother intercepting pthread_cancel with
NO_CANCEL_SAFE. Pthread_exit should probably still be intercepted.
My guess is that we should always intercept pthread_exit on Posix
systems. Since we seem to have issues on both Solaris and Linux,
it wouldn't surprise me if this problem were fairly pervasive.
I'm not sure we need the macro. I think pthread_exit needs to be
treated just like the other intercepted pthread calls.
> Thu, 29 Jul 2010 22:24:18 +0000 "Boehm, Hans" <hans.boehm at hp.com>:
>> Sorry for being so slow. I'm usually better with easier questions :-)
>>> -----Original Message-----
>>> From: Ivan Maidanski [mailto:ivmai at mail.ru]
>>> Sent: Tuesday, July 13, 2010 4:28 AM
>>> To: Boehm, Hans
>>> Cc: gc at linux.hpl.hp.com
>>> Subject: Re: [Gc] FW: GC: Time for GC final release? (draft patch
>>> for cancellation)
>>> This is a draft/incomplete (and NOT working) patch for case 1.
>>> To test it, use -D GC_INTERCEPT_PTHREAD_EXIT.
>>> It is not working because: I don't know how to do GC_enable for
>>> pthread_exit (since it is a no-return function). In
>>> GC_thread_exit_proc? Any ideas?
>> I think it has to be reenabled in GC_thread_exit_proc, right after calling GC_unregister_my_thread. We probably need to add a GC_DISABLED or EXITING flag to the flags field in the thread structure, so that GC_thread_exit_proc can tell whether it needs to reenable the GC.
>>> Q: Will pthread_cancel() interception really help us, since it it just
>>> a send-signal routine (I mean it does not wait for the thread exiting,
>> Yes. The problem is that if a GC is started between pthread_cancel and thread exit, GC_stop_world will block with the GC lock held, waiting for the exiting thread to respond to the signal, which it won't if it's also trying to grab the GC lock. By not starting the GC during that time, GC_stop_world() shouldn't get called in this interval, with a damaged thread around, and we should be OK.
>>> Another question: is it enough for pthread_cancel/exit (and also
>>> GC_start_routine) to use GC_disable, or we need something close to
>> After staring at this for a while, I think GC_disable() is sufficient. In the dlopen case, an ongoing mark process might break, because the root set is changing. Thus we need to make sure that no GC mark phase is in progress. I think in the exiting case that shouldn't matter. If I'm wrong, we'll see more deadlocks.
>>> Fri, 9 Jul 2010 01:19:06 +0000 "Boehm, Hans" <hans.boehm at hp.com>:
>>>> It seems to me that we really want two patches related to
>>> cancellation that aren't yet in the tree:
>>>> 1) We should deal with the fact that apparently on Solaris and
>>> probably on Linux we can't collect while a thread is exiting, since
>>> signals aren't handled properly. This gives currently gives rise to
>>> deadlocks. I think the only workaround is to also intercept
>>> pthread_cancel and pthread_exit and disable GC until the thread exit
>>> handler is called. That's ugly, because we risk growing the heap
>>> unnecessarily, and possibly repeatedly. But it seems that we don't
>>> really have an option in that the process is not in a fully functional
>>> state while a thread is exiting.
>>>> 2) ...
>>>> I was hoping to find some time to work on this this week. But so far,
>>> it looks like I failed.
>>>> These are both a bit frustrating, because I think they're really
>>>> problems in the underlying Posix layers that are likely to also
>>>> other things. And they don't seem to admit good solutions
>>>>> -----Original Message-----
>>>>> From: Ivan Maidanski [mailto:ivmai at mail.ru] ...
More information about the Gc