[Gc] GC crash on OSX 10.3.1 (7C103)

Andrew Begel abegel@eecs.berkeley.edu
Mon, 10 Nov 2003 22:49:08 -0800


And now I've narrowed down the bug to the logic failure. The 
GC_thr_init() function in pthread_support.c assumes it is running from 
the main thread, maybe even before other threads are created. However, 
when I link my library into a running Java image, it is certainly not 
the main thread, and I can't control how many other threads have been 
created already. And I can't even promise not to use the GC on those 
other threads because I don't have control over which thread Java will 
execute my code in.

This feels like a design impasse. Any ideas?

Andy


On Nov 10, 2003, at 10:41 PM, Andrew Begel wrote:

> I found the bug, but I'm not sure how to fix it.
>
> With the following stack trace:
>
> #0  GC_push_all (bottom=0xf07fe3e0 "", top=0xc0000000 <Address 
> 0xc0000000 out of bounds>) at mark.c:1212
> #1  0x08bf5468 in GC_push_all_stack (bottom=0xf07fe3e0 "", 
> top=0xc0000000 <Address 0xc0000000 out of bounds>) at mark.c:1519
> #2  0x08c0173c in GC_push_all_stacks () at darwin_stop_world.c:98
> #3  0x08bf9b64 in GC_default_push_other_roots () at os_dep.c:2013
> #4  0x08bf7354 in GC_push_roots (all=1, cold_gc_frame=0xf07fe670 
> "\177") at mark_rts.c:643
> #5  0x08bf3508 in GC_mark_some (cold_gc_frame=0xf07fe670 "\177") at 
> mark.c:326
> #6  0x08be9320 in GC_stopped_mark (stop_func=0x8be84ac 
> <GC_never_stop_func>) at alloc.c:515
> #7  0x08be8de0 in GC_try_to_collect_inner (stop_func=0x8be84ac 
> <GC_never_stop_func>) at alloc.c:362
> #8  0x08bf82c4 in GC_init_inner () at misc.c:767
> #9  0x08bf7cdc in GC_init () at misc.c:486
>
> we have gotten ourselves in a situation where GC_push_all is called 
> with a top that is less than the bottom. In GC_push_all() there's a 
> line length = top - bottom, which goes negative, and its this length 
> that is stored in the GC_mark_stack_top structure that is very very 
> wrong.
>
> So, we look in GC_push_all_stacks() (in darwin_stop_world.c:19), which 
> iterates over all threads and calls GC_push_all_stack() with the lo = 
> approximate stack pointer of the current thread (0xf07fe3e0) and hi 
> equal to end of the stack (p->stack_end) The first thread in the loop 
> is apparently the main thread of the app (a Java VM thread in my 
> case), so hi = GC_stackbottom which is 0xc0000000.
>
> Now, I know on Darwin the stack grows down. So shouldn't hi and lo be 
> swapped here? Who wrote this code?
>
> I'm also suspicious that I know the thread that is calling GC_init() 
> is *not* the main thread; it's just a pthread spawned by the Java VM 
> process. If that were the case then hi would be 0x0, which is still 
> very wrong.
>
> Perhaps the GC_threads array could have been set up improperly?
>
> Thanks for all the hints so far,
>
> Andy
>
> On Nov 10, 2003, at 8:45 PM, Boehm, Hans wrote:
>
>> The mark stack consists of pairs (start address, descriptor).  The 
>> descriptor
>> type is identified by the last two bits.  If they're zero, the 
>> descriptor is
>> just a length in bytes.
>>
>> Each stack entry describes a memory region to trace.  The descriptor 
>> you're
>> looking at (*mark_stack_top) is completely bogus, since the length is 
>> huge.
>>
>> Presumably some root segment was misidentified.  You might try calling
>> GC_dump when GC_mark_from is first entered to try to confirm that the
>> static root segments look OK.  Also check that GC_stackbottom (the 
>> base
>> of the main application stack is reasonable.  If those don't tell you 
>> anything,
>> I would watch the mark stack location holding the bogus length and 
>> see how it
>> gets there.
>>
>> Hans
>>
>>> -----Original Message-----
>>> From: gc-admin@napali.hpl.hp.com 
>>> [mailto:gc-admin@napali.hpl.hp.com]On
>>> Behalf Of Andrew Begel
>>> Sent: Monday, November 10, 2003 8:11 PM
>>> To: 'gc@linux.hpl.hp.com'
>>> Subject: [Gc] GC crash on OSX 10.3.1 (7C103)
>>>
>>>
>>> I'm getting a consistent crash in the garbage collector GC_init()
>>> routine when I try linking in the libgc.dylib from a Java application
>>> (works fine when linked from other apps, even complex situations like
>>> XEmacs loading my lib).
>>>
>>> I've got a bundle that loads a dylib that uses the garbage
>>> collector. I
>>> have an init routine on the dylib to call GC_init() when the dylib is
>>> loaded. This all occurs successfully, however, the garbage collector
>>> crashes in GC_mark_from() (the 3rd time that it is called).
>>> It crashes
>>> in both optimized and non-optimized libgc. Here's a stack trace:
>>>
>>> #0  0x08bf4010 in GC_mark_from (mark_stack_top=0x6e00a8,
>>> mark_stack=0x6e00a8, mark_stack_limit=0x6e80a8) at mark.c:759
>>> #1  0x08bf35a8 in GC_mark_some (cold_gc_frame=0xf07fe670
>>> "?\177??") at
>>> mark.c:361
>>> #2  0x08be9320 in GC_stopped_mark (stop_func=0x8be84ac
>>> <GC_never_stop_func>) at alloc.c:515
>>> #3  0x08be8de0 in GC_try_to_collect_inner (stop_func=0x8be84ac
>>> <GC_never_stop_func>) at alloc.c:362
>>> #4  0x08bf82c4 in GC_init_inner () at misc.c:767
>>> #5  0x08bf7cdc in GC_init () at misc.c:486
>>> #6  0x0afca578 in alloc_init() () at alloc.cc:36
>>> #7  0x0afca4d8 in oft_init2 () at macosx.cc:7
>>> #8  0x8fe09c18 in __dyld_call_image_init_routines ()
>>> #9  0x8fe11880 in __dyld_link_in_need_modules ()
>>> #10 0x8fe134e4 in __dyld__dyld_link_module ()
>>> #11 0x9003f5a8 in NSLinkModule ()
>>> #12 0x9487ff4c in JNI_CreateJavaVM_Impl ()
>>> #13 0x948987a8 in JVM_LoadLibrary ()
>>> #14 0x94742fbc in
>>> Java_java_lang_ClassLoader_00024NativeLibrary_load ()
>>>
>>> Program received signal EXC_BAD_ACCESS, Could not access memory.
>>> 0x08bf4010 in GC_mark_from (mark_stack_top=0x6e00a8,
>>> mark_stack=0x6e00a8, mark_stack_limit=0x6e80a8) at mark.c:759
>>> 759               deferred = *limit;
>>> (gdb) p limit
>>> $37 = (word *) 0xf0801180
>>> (gdb) p current_p
>>> $38 = (word *) 0xf0800f88
>>> (gdb) p deferred
>>> $39 = 25170432
>>>
>>>
>>> I've tried stepping through GC_mark_from() to see what's
>>> wrong with it,
>>> but I can't make much headway into the code. What is this
>>> code supposed
>>> to be doing to the stack? How does it know when it is done?
>>> Why is the
>>> limit and current_p so far away from the mark_stack_top?
>>>
>>> At the beginning of this call to GC_mark_from() I printed out
>>> mark_stack_top:
>>>
>>> 634       while ((((ptr_t)mark_stack_top - (ptr_t)mark_stack)
>>> | credit)
>>> (gdb) p mark_stack_top
>>> $4 = (mse *) 0x6e00a8
>>> (gdb) n
>>> 637         current_p = mark_stack_top -> mse_start;
>>> (gdb) n
>>> 638         descr = mark_stack_top -> mse_descr;
>>> (gdb) p current_p
>>> $5 = (word *) 0xf0800798
>>> (gdb) p *mark_stack_top
>>> $6 = {
>>>    mse_start = 0xf0800798,
>>>    mse_descr = 3481270376
>>> }
>>>
>>> What's up at 0xf0800798? Looks like it is fairly close to the address
>>> of my bus error.
>>>
>>> Any ideas on how to proceed in debugging this?
>>>
>>> Andrew
>>>
>>> _______________________________________________
>>> Gc mailing list
>>> Gc@linux.hpl.hp.com
>>> http://linux.hpl.hp.com/cgi-bin/mailman/listinfo/gc
>>>
>