[Gc] moving from 6.8->7.0 getting smashed objects

jim marshall jim.marshall at wbemsolutions.com
Mon Jul 23 20:33:01 PDT 2007



Boehm, Hans wrote:
> This is the only thread running at this point?  Is there another thread
> that could be running in the interim and corrupting the heap?
>   
At the point this happens we only have one thread executing.
> This is built with DBG_HDRS_ALL?
>   
I'll have to check, I suspect not though
> It looks to me like the GC_check_heap_block message should be
> impossible.  It should print the start of the user-visible object. I
> don't think an address ending in 000 is likely to qualify.  (That might
> also explain why you never see it being returned.)  Looking at the code,
> I also no longer believe this is being printed correctly.  I suspect
> that this is not the main problem, but I will see if I can generate a
> test case to reproduce this, and investigate.  Hopefully a patch is
> forthcoming ...
>
> I believe you are reading way too much into the precise point at which
> the message appears.  The heap is scanned for overwrite errors during
> each GC.  It is inconvenient to print anything there, so that's
> postponed until the next GC_print_all_errors call, which happens during
> some, but not all, allocations.
>   
OK - I presumed it checked during each allocation.
> You probably want to set a breakpoint in GC_check_heap_proc(), and look
> at GC_n_smashed afterwards.  The GC_smashed[] array should contain
> pointers to the locations that the GC thought were clobbered.  I suspect
> it's safe to invoke GC_check_heap_proc from a debugger to narrow down
> the point at which things go south.
>   
I will take a look at this.

Thanks
> Hans
>
>   
>> -----Original Message-----
>> From: jim marshall [mailto:jim.marshall at wbemsolutions.com] 
>> Sent: Thursday, July 19, 2007 10:15 PM
>> To: Boehm, Hans
>> Cc: gc at napali.hpl.hp.com
>> Subject: Re: [Gc] moving from 6.8->7.0 getting smashed objects
>>
>> Boehm, Hans wrote:
>>     
>>> I think the algorithm for detecting smashed objects has not changed.
>>> Lots of other things, including object placement, no doubt 
>>>       
>> have.  It's 
>>     
>>> quite conceivable that either:
>>>
>>> 1) Other changes cause the overwrite to be noticed in 7.0 
>>>       
>> but not 6.8
>>     
>>> 2) There is another bug in 7.0
>>>
>>> I would hope that it wouldn't take that long to debug this from the 
>>> smashed object messages?  It should be fairly easy to 
>>>       
>> determine where 
>>     
>>> objects in the vicinity were allocated.  If the problem is 
>>>       
>> repeatable 
>>     
>>> enough, a watchpoint on the overwritten location might even work.
>>>   
>>>       
>> I've not made much headway with this, perhaps someone could 
>> point me in the direction to go.
>>
>> Basically what I have found is that at some point our program 
>> calls GC_MALLOC (gc_debug_malloc - as we are using a debug 
>> build). This call returns successfully (no smashed objects). 
>> However the very next allocation causes the GC to spit out 
>> the smashed object warning. Here is a GDB session to give an example:
>>
>> wsi_malloc (pSize=78) at src/wsimemory.c:47
>> 47          void *mem = GC_MALLOC(pSize);
>> (gdb) s
>> GC_debug_malloc (lb=78, s=0x400535c9 "src/wsimemory.c", i=47) 
>> at dbg_mlc.c:457
>> 457     {
>> (gdb) n
>> 458         void * result = GC_malloc(lb + DEBUG_BYTES);
>> (gdb) 
>> 460         if (result == 0) {
>> (gdb) 
>> 458         void * result = GC_malloc(lb + DEBUG_BYTES);
>> (gdb) 
>> 460         if (result == 0) {
>> (gdb) 
>> 467         if (!GC_debugging_started) {
>> (gdb) 
>> 471         return (GC_store_debug_info(result, (word)lb, s, 
>> (word)i));
>> (gdb) print result
>> $1 = (void *) 0x809df70
>> (gdb) n
>> 472     }
>> (gdb)
>> wsi_malloc (pSize=78) at src/wsimemory.c:50
>> 50          memset(mem, 0, pSize);
>> (gdb) print mem
>> $2 = (void *) 0x809df80
>> (gdb) call GC_debug_malloc(64, "myfile.c", 1)
>> GC_check_heap_block: found smashed heap objects:
>> 0x80d1008 in object at 0x80d1000(<smashed>, appr. sz = 197)
>> $3 = (void *) 0x80a9f88
>> (gdb) 
>>
>>
>> You can see above that we call GC_debug_malloc, this returns. 
>> Now before 
>> I executed line 50 (the memset call) I had GDB 'call GC_debug_malloc' 
>> directly and it detects the smashed object.
>>
>> You can see the smashed object is at 0x80d1000 which is 
>> consistent, but 
>> I can not find a place where this address is returned to my 
>> application 
>> (I set breaks in all the GC allocate functions I could and nothing 
>> returned seemed to be in that range). As an aside to the previous 
>> sentence, our application makes a lot of allocation calls, so while I 
>> set the break points, I didn't enable them for a while. I 
>> tried setting 
>> conditional breaks in GC_debug_malloc (and others) for when 
>> result==0x80d1000, but they never got hit.
>>
>> Setting a watch point on those memory locations cause the program to 
>> crawl (or hang - I couldn't determine). My test machine is a 
>> Celeron and 
>> GDB does not use a hardware watch when I use the watch command on the 
>> memory address.
>>
>> Any ideas on what else I can do to help determine if this is 
>> in our app 
>> (which is my suspicion) or an issue with GC 7.0?
>>
>> Thanks!
>> -Jim
>>
>>     
>
> _______________________________________________
> Gc mailing list
> Gc at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
>
>
>
>   

-- 
Jim Marshall
Sr. Staff Engineer
WBEM Solutions, Inc.
978-947-3607



More information about the Gc mailing list