[Gc] Problem with GC on FreeBSD

Petter Urkedal urkedal at nbi.dk
Wed Apr 18 12:51:37 PDT 2012


Hi Ivan and Vitaly,

On 2012-04-18, Ivan Maidanski wrote:
> Hi Petter and Vitaly,
> 
> Wed, 18 Apr 2012 19:07:02 +0200 Petter Urkedal <urkedal at nbi.dk>:
> > On 2012-04-18, Vitaly Magerya wrote:
> > > Ivan Maidanski <ivmai at mail.ru> wrote:
> > > > 2. I guess USE_CUSTOM_SPECIFIC is defined for FreeBSD in
> > > > thread_local_alloc.h, if the bug is probably in GC_setspecific.
> > > > 3. Try to compile with explicit -D USE_PTHREAD_SPECIFIC (probably we could
> > > > use it permanently for FreeBSD but it would be good to find out what's wrong
> > > > in GC_setspecific)
> > > 
> > > This seems to help: disclaim_test now passes, and STklos works
> > > properly too, but only if I compile libgc with --enable-gc-debug;
> > > without that STklos segfaults in the middle of it's test suite (even
> > > before threading tests); the crash is in libgc, here's the backtrace:
> > > 
> > > #0  0x0000000801184c3e in GC_clear_fl_marks (q=0x64636261 <Error reading address 0x64636261: Bad address>) at alloc.c:760
> > > #1  0x0000000801184eaf in GC_finish_collection () at alloc.c:879
> > > #2  0x0000000801184690 in GC_try_to_collect_inner (stop_func=0x801183f00 <GC_never_stop_func>) at alloc.c:472
> > > #3  0x000000080118595e in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0, retry=0) at alloc.c:1239
> > > #4  0x0000000801185c71 in GC_allocobj (gran=4, kind=1) at alloc.c:1328
> > > #5  0x000000080118beec in GC_generic_malloc_inner (lb=48, k=1) at malloc.c:122
> > > #6  0x000000080118d441 in GC_generic_malloc_many (lb=48, k=1, result=0x8013a3bc0) at mallocx.c:425
> > > #7  0x0000000801196455 in GC_malloc (bytes=32) at thread_local_alloc.c:175
> > > #8  0x0000000000417aef in STk_clone_frame (f=0x903930) at env.c:400
> > > [... STklos stuff below ...]
> > > 
> > > The offending address (q=0x64636261) is GC_obj_kinds[0].ok_freelist[36];
> > > looks like corrupted data?
> > > 
> > > Moreover, if I build libgc with gc-assertions and gc-debug,
> > > disclaim_test fails again. More precisely, sometimes it segfaults,
> > > sometimes it fails with "Assertion failure: dbg_mlc.c:843", and
> > > other times it passes.
> > >  
> > > With --enable-gc-assertions and without --enable-gc-debug disclaim_test
> > > always segfaults:
> > > 
> > > #0  0x000000080086136f in GC_is_marked (p=0x30e00000001) at mark.c:233
> > > #1  0x000000080085642f in GC_check_fl_marks (q=0xa813c0 "\001") at alloc.c:739
> > > #2  0x000000080086be9c in GC_check_tls_for (p=0x842ac0) at thread_local_alloc.c:328
> > > #3  0x000000080086deee in GC_check_tls () at pthread_support.c:316
> > > #4  0x00000008008565ec in GC_finish_collection () at alloc.c:812
> > > #5  0x0000000800855e10 in GC_try_to_collect_inner (stop_func=0x8008554c0 <GC_never_stop_func>) at alloc.c:472
> > > #6  0x00000008008573ae in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0, retry=0) at alloc.c:1239
> > > #7  0x00000008008576f1 in GC_allocobj (gran=2, kind=4) at alloc.c:1328
> > > #8  0x000000080085ebec in GC_generic_malloc_inner (lb=32, k=4) at malloc.c:122
> > > #9  0x00000008008607da in GC_generic_malloc_many (lb=32, k=4, result=0xb08d28) at mallocx.c:425
> > > #10 0x0000000800871203 in GC_finalized_malloc (client_lb=24, fclos=0x401690) at fnlz_mlc.c:152
> > > #11 0x0000000000400ecf in pair_new (car=0xb45da0, cdr=0x0) at tests/disclaim_test.c:114
> > > #12 0x000000000040125c in test (data=0x0) at tests/disclaim_test.c:178
> > > #13 0x000000080086de07 in GC_inner_start_routine (sb=0x7fffff5faf90, arg=0x841fc0) at pthread_start.c:56
> > > #14 0x00000008008678c6 in GC_call_with_stack_base (fn=0x80086ddc0 <GC_inner_start_routine>, arg=0x841fc0) at misc.c:1622
> > > #15 0x000000080086fc7c in GC_start_routine (arg=0x841fc0) at pthread_support.c:1613
> > > #16 0x0000000800aa4274 in pthread_getprio () from /lib/libthr.so.3
> > > #17 0x0000000000000000 in ?? ()
> > 
> > Hi Vitaly,
> > 
> > I'll focus on this part first, since it's most directly related to the
> > code I wrote.  It seems the thread local freelists are the problem,
> > maybe they are not initialized.  The thread local key should be working
> > now with -D USE_PTHREAD_SPECIFIC, right?  But I'm wondering whether
> > GC_init_thread_local gets called.  It may be worth trying to change
> > pthread_create to GC_pthread_create in disclaim_test.c (or insert a call
> > to GC_unregister_my_thread inside the thread call-back).  If it makes a
> > difference, this might be relevant for the above segfault, as well.

Corr: "GC_unregister_my_thread" should read "GC_register_my_thread".

> As Vitaly reported, disclaim_test had been single-threaded (before latest checkout) and crashed too, so GC_init_thread_local seems to be called (from GC_init). I asked Vitaly to test "release" branch (which has no "disclaim" functionality).

The test case was single threaded due to the missing include you fixed
recently.  Nevertheless, it he ran the test on a multi-threaded libgc
which could expose such an issue.

I tried to trace how GC_init_thread_local gets called without the use of
GC_pthread_create or GC_register_my_thread.  The only path I can find is
from a redirected pthread_create in case GC_USE_LD_WRAP is enabled.  Am
I missing something?

Petter


More information about the Gc mailing list