[Gc] Hang on Red Hat Enterprise Linux?

Kenneth C. Schalk ken at xorian.net
Mon Apr 26 16:50:06 PDT 2004


I'm working on getting some code that makes pretty significant use of
the garbage collector (Vesta) working on a machine running:

    Red Hat Enterprise Linux AS release 3 (Taroon Update 1)

This distribution is significantly different from what I've done most
of my testing on so far in that it uses NPTL rather than LinuxThreads.

The problem I'm experiencing is that programs linked with the garbage
collector seems to periodically hang (go to sleep and not accumulate
any more CPU time).  When I attach to such a program with the
debugger, I don't see anything obviously wrong.  When I detach the
debugger, the program starts going again (accumulating more CPU time).
Sometimes the same program will hang more than once, and attaching and
detaching the debugger will "un-stick" it again.  This even happens
with gctest.

I'm using version 6.2 of the garbage collector.  (I haven't gotten to
trying 6.3alpha5 yet, maybe tomorrow.)  The garbage collector
configuration I'm using is:

    -DATOMIC_UNCOLLECTABLE
    -DNO_SIGNALS
    -DNO_EXECUTE_PERMISSION
    -DSILENT
    -DALL_INTERIOR_POINTERS
    -DLARGE_CONFIG
    -DUSE_MMAP
    -DGC_LINUX_THREADS
    -D_REENTRANT
    -DTHREAD_LOCAL_ALLOC
    -DPARALLEL_MARK

I built gctest with the classic Makefile, putting all the above in the
"CFLAGS" definition line.

The hardware I'm seeing these problems on is a 4-way Xeon 2.80GHz
machine with HyperThreading enabled, which looks like an 8-CPU machine
under Linux.

I've found experimentally that the hang is less likely to occur with
gctest when I set the GC_MARKERS environment variable to a number
smaller than default number of mark threads (8).  So far I haven't
seen the hang with GC_MARKERS set to 5 or less, but I have with 6-8.
(I haven't yet tested this with the application I'm trying to get
working.)

It seems pretty clear to me that just attaching and detaching the
debugger shouldn't have an effect on the program's execution, and I'm
a little suspicious that this could be a bug in either NPTL or the
Linux kernel.  It also seems like disabling PARALLEL_MARK would make
the problem go away, although I'd rather not do that.  If I find that
setting GC_MARKERS is an effective work-around I may just stick with
that, but I'd rather fix the problem, and it definitely seemed worth
bringing up on the list.

--Ken

P.S.  A little more version information, in case it turns out to be
relevant:

% uname -a
Linux mmdcs100 2.4.21-9.ELsmp #1 SMP Thu Jan 8 17:08:56 EST 2004 i686 i686
i386 GNU/Linux
% rpm -q glibc
glibc-2.3.2-95.6



More information about the Gc mailing list