[Gc] GC crash on OSX 10.3.1 (7C103)

Andrew Begel abegel@eecs.berkeley.edu
Mon, 10 Nov 2003 22:41:44 -0800


I found the bug, but I'm not sure how to fix it.

With the following stack trace:

#0  GC_push_all (bottom=0xf07fe3e0 "", top=0xc0000000 <Address 
0xc0000000 out of bounds>) at mark.c:1212
#1  0x08bf5468 in GC_push_all_stack (bottom=0xf07fe3e0 "", 
top=0xc0000000 <Address 0xc0000000 out of bounds>) at mark.c:1519
#2  0x08c0173c in GC_push_all_stacks () at darwin_stop_world.c:98
#3  0x08bf9b64 in GC_default_push_other_roots () at os_dep.c:2013
#4  0x08bf7354 in GC_push_roots (all=1, cold_gc_frame=0xf07fe670 
"\177") at mark_rts.c:643
#5  0x08bf3508 in GC_mark_some (cold_gc_frame=0xf07fe670 "\177") at 
mark.c:326
#6  0x08be9320 in GC_stopped_mark (stop_func=0x8be84ac 
<GC_never_stop_func>) at alloc.c:515
#7  0x08be8de0 in GC_try_to_collect_inner (stop_func=0x8be84ac 
<GC_never_stop_func>) at alloc.c:362
#8  0x08bf82c4 in GC_init_inner () at misc.c:767
#9  0x08bf7cdc in GC_init () at misc.c:486

we have gotten ourselves in a situation where GC_push_all is called 
with a top that is less than the bottom. In GC_push_all() there's a 
line length = top - bottom, which goes negative, and its this length 
that is stored in the GC_mark_stack_top structure that is very very 
wrong.

So, we look in GC_push_all_stacks() (in darwin_stop_world.c:19), which 
iterates over all threads and calls GC_push_all_stack() with the lo = 
approximate stack pointer of the current thread (0xf07fe3e0) and hi 
equal to end of the stack (p->stack_end) The first thread in the loop 
is apparently the main thread of the app (a Java VM thread in my case), 
so hi = GC_stackbottom which is 0xc0000000.

Now, I know on Darwin the stack grows down. So shouldn't hi and lo be 
swapped here? Who wrote this code?

I'm also suspicious that I know the thread that is calling GC_init() is 
*not* the main thread; it's just a pthread spawned by the Java VM 
process. If that were the case then hi would be 0x0, which is still 
very wrong.

Perhaps the GC_threads array could have been set up improperly?

Thanks for all the hints so far,

Andy

On Nov 10, 2003, at 8:45 PM, Boehm, Hans wrote:

> The mark stack consists of pairs (start address, descriptor).  The 
> descriptor
> type is identified by the last two bits.  If they're zero, the 
> descriptor is
> just a length in bytes.
>
> Each stack entry describes a memory region to trace.  The descriptor 
> you're
> looking at (*mark_stack_top) is completely bogus, since the length is 
> huge.
>
> Presumably some root segment was misidentified.  You might try calling
> GC_dump when GC_mark_from is first entered to try to confirm that the
> static root segments look OK.  Also check that GC_stackbottom (the base
> of the main application stack is reasonable.  If those don't tell you 
> anything,
> I would watch the mark stack location holding the bogus length and see 
> how it
> gets there.
>
> Hans
>
>> -----Original Message-----
>> From: gc-admin@napali.hpl.hp.com [mailto:gc-admin@napali.hpl.hp.com]On
>> Behalf Of Andrew Begel
>> Sent: Monday, November 10, 2003 8:11 PM
>> To: 'gc@linux.hpl.hp.com'
>> Subject: [Gc] GC crash on OSX 10.3.1 (7C103)
>>
>>
>> I'm getting a consistent crash in the garbage collector GC_init()
>> routine when I try linking in the libgc.dylib from a Java application
>> (works fine when linked from other apps, even complex situations like
>> XEmacs loading my lib).
>>
>> I've got a bundle that loads a dylib that uses the garbage
>> collector. I
>> have an init routine on the dylib to call GC_init() when the dylib is
>> loaded. This all occurs successfully, however, the garbage collector
>> crashes in GC_mark_from() (the 3rd time that it is called).
>> It crashes
>> in both optimized and non-optimized libgc. Here's a stack trace:
>>
>> #0  0x08bf4010 in GC_mark_from (mark_stack_top=0x6e00a8,
>> mark_stack=0x6e00a8, mark_stack_limit=0x6e80a8) at mark.c:759
>> #1  0x08bf35a8 in GC_mark_some (cold_gc_frame=0xf07fe670
>> "?\177??") at
>> mark.c:361
>> #2  0x08be9320 in GC_stopped_mark (stop_func=0x8be84ac
>> <GC_never_stop_func>) at alloc.c:515
>> #3  0x08be8de0 in GC_try_to_collect_inner (stop_func=0x8be84ac
>> <GC_never_stop_func>) at alloc.c:362
>> #4  0x08bf82c4 in GC_init_inner () at misc.c:767
>> #5  0x08bf7cdc in GC_init () at misc.c:486
>> #6  0x0afca578 in alloc_init() () at alloc.cc:36
>> #7  0x0afca4d8 in oft_init2 () at macosx.cc:7
>> #8  0x8fe09c18 in __dyld_call_image_init_routines ()
>> #9  0x8fe11880 in __dyld_link_in_need_modules ()
>> #10 0x8fe134e4 in __dyld__dyld_link_module ()
>> #11 0x9003f5a8 in NSLinkModule ()
>> #12 0x9487ff4c in JNI_CreateJavaVM_Impl ()
>> #13 0x948987a8 in JVM_LoadLibrary ()
>> #14 0x94742fbc in
>> Java_java_lang_ClassLoader_00024NativeLibrary_load ()
>>
>> Program received signal EXC_BAD_ACCESS, Could not access memory.
>> 0x08bf4010 in GC_mark_from (mark_stack_top=0x6e00a8,
>> mark_stack=0x6e00a8, mark_stack_limit=0x6e80a8) at mark.c:759
>> 759               deferred = *limit;
>> (gdb) p limit
>> $37 = (word *) 0xf0801180
>> (gdb) p current_p
>> $38 = (word *) 0xf0800f88
>> (gdb) p deferred
>> $39 = 25170432
>>
>>
>> I've tried stepping through GC_mark_from() to see what's
>> wrong with it,
>> but I can't make much headway into the code. What is this
>> code supposed
>> to be doing to the stack? How does it know when it is done?
>> Why is the
>> limit and current_p so far away from the mark_stack_top?
>>
>> At the beginning of this call to GC_mark_from() I printed out
>> mark_stack_top:
>>
>> 634       while ((((ptr_t)mark_stack_top - (ptr_t)mark_stack)
>> | credit)
>> (gdb) p mark_stack_top
>> $4 = (mse *) 0x6e00a8
>> (gdb) n
>> 637         current_p = mark_stack_top -> mse_start;
>> (gdb) n
>> 638         descr = mark_stack_top -> mse_descr;
>> (gdb) p current_p
>> $5 = (word *) 0xf0800798
>> (gdb) p *mark_stack_top
>> $6 = {
>>    mse_start = 0xf0800798,
>>    mse_descr = 3481270376
>> }
>>
>> What's up at 0xf0800798? Looks like it is fairly close to the address
>> of my bus error.
>>
>> Any ideas on how to proceed in debugging this?
>>
>> Andrew
>>
>> _______________________________________________
>> Gc mailing list
>> Gc@linux.hpl.hp.com
>> http://linux.hpl.hp.com/cgi-bin/mailman/listinfo/gc
>>