[Gc] a problem with the collector with multi-threaded applications

Manuel Serrano Manuel.Serrano at sophia.inria.fr
Wed Apr 27 04:57:55 PDT 2005


Hello Hans,

Sorry I have been pretty long to answer your mail.

> This is X86/Linux?  Do you build with THREAD_LOCAL_ALLOC
> defined?
Yes it is.

> Can you look at the instruction that's failing?
It is dereferencing a nul pointer.

> I replaced the GC_start/end_blocking interface in the 7.0 tree
> since I think there is a subtle race here.  If the GC happens
> just before the blocking system call, some callee-saves registers
> may not have been saved yet.  Thus the GC can miss them, and
> they could conceivably contains pointers from the caller.
> Saving the context in the caller (with getcontext or
> __builtin_unwind_init) should work around the problem.
> 
> But that seems extremely unlikely for 32-bit X86 code.  And
> the symptoms don't look like that's happening.  In fact, I have
> no idea what could be generating the SIGILL.  A disassembly of the
> code around 0xb7d8056c might suggest something.
I have actually removed all the calls to GC_start/end_blocking and my
problems seem to have totally disappeared. I'm running an HTTP server
that uses your GC permanently. Since, I have removed the calls to 
GC_start/end_blocking it now runs like a charm (when it used to be 
periodically crashing). I really think that something is broken
with GC_start/end_blocking.

I have not had a look at the version 7.0 because, as far as I have
understood, this is still a preliminary version and thus I have to
find a workaround to the problem I'm currently facing with the version
6.4. May be someone could suggest something to me? Here is my problem:

I'm programming a multi-threaded application that uses long lasting
system calls. From time to time, in a multi-threaded context, these
system calls are interrupted because of the GC SIGXCPU and SIGPWR
signals. For some sys calls, this is not a problem because the program
can react to the interruption. For instance, when returning from
"accept" the program can test for EINTR and re-initiate the "accept"
when necessary. However, in some situations, this programming scheme
is impossible. For instance, it is a true error to be interrupted when
serving characters with the Linux "sendfile" system call. The
"sendfile" call cannot be "restarted".  I'm wondering how such
applications should be programmed. What is this approach fostered by
the GC? Does it belongs to the application to explicitly ignore
SIGXCPU and SIGPWR when invoking functions such as "sendfile". Is it
possible to do so?

I would appreciate any advices or experience reports on this subject.
Many thanks in advance.

-- 
Manuel


More information about the Gc mailing list