[Gc] moving from 6.8->7.0 getting smashed objects

Alec Orr Alec.Orr at wbemsolutions.com
Tue Aug 7 06:26:24 PDT 2007


Hans:

Thank you for your reply.  Jim and I will try this out later this week, and post 
here.

Thanks!
Alec

Hans Boehm wrote:

> Any progress on this?  Has anyone else seen similar issues?
> 
> I checked some fairly significant patches into the CVS tree yesterday,
> including:
> 
> A bug fix for printing smashed objects.  I think this was actually a 
> minor bug that affected only the quality of the output.
> 
> GC_malloc and GC_malloc_atomic should now force initialization, even
> with thread local allocation.  Thus the default should now behave
> similarly to 6.8.
> 
> Some improvements for REDIRECT_MALLOC.  It's now easier to build
> a Linux shared library that redefines malloc and can be LD_PRELOADed
> for use with previously linked applications.  This is stil imperfect.
> Things like Mozilla and abiword don't work across all distributions.
> (In some cases, I suspect this requires some fairly ugly hacks to change,
> since they may squirrel away pointers in strange places.)
> 
> Hans
> 
> On Mon, 23 Jul 2007, jim marshall wrote:
> 
>>
>>
>> Boehm, Hans wrote:
>>
>>> This is the only thread running at this point?  Is there another thread
>>> that could be running in the interim and corrupting the heap?
>>>
>> At the point this happens we only have one thread executing.
>>
>>> This is built with DBG_HDRS_ALL?
>>>
>> I'll have to check, I suspect not though
>>
>>> It looks to me like the GC_check_heap_block message should be
>>> impossible.  It should print the start of the user-visible object. I
>>> don't think an address ending in 000 is likely to qualify.  (That might
>>> also explain why you never see it being returned.)  Looking at the code,
>>> I also no longer believe this is being printed correctly.  I suspect
>>> that this is not the main problem, but I will see if I can generate a
>>> test case to reproduce this, and investigate.  Hopefully a patch is
>>> forthcoming ...
>>>
>>> I believe you are reading way too much into the precise point at which
>>> the message appears.  The heap is scanned for overwrite errors during
>>> each GC.  It is inconvenient to print anything there, so that's
>>> postponed until the next GC_print_all_errors call, which happens during
>>> some, but not all, allocations.
>>>
>> OK - I presumed it checked during each allocation.
>>
>>> You probably want to set a breakpoint in GC_check_heap_proc(), and look
>>> at GC_n_smashed afterwards.  The GC_smashed[] array should contain
>>> pointers to the locations that the GC thought were clobbered.  I suspect
>>> it's safe to invoke GC_check_heap_proc from a debugger to narrow down
>>> the point at which things go south.
>>>
>> I will take a look at this.
>>
>> Thanks
>>
>>> Hans
>>>
>>>
>>>> -----Original Message-----
>>>> From: jim marshall [mailto:jim.marshall at wbemsolutions.com] Sent: 
>>>> Thursday, July 19, 2007 10:15 PM
>>>> To: Boehm, Hans
>>>> Cc: gc at napali.hpl.hp.com
>>>> Subject: Re: [Gc] moving from 6.8->7.0 getting smashed objects
>>>>
>>>> Boehm, Hans wrote:
>>>>
>>>>> I think the algorithm for detecting smashed objects has not changed.
>>>>> Lots of other things, including object placement, no doubt 
>>>>
>>>> have.  It's
>>>>
>>>>> quite conceivable that either:
>>>>>
>>>>> 1) Other changes cause the overwrite to be noticed in 7.0 
>>>>
>>>> but not 6.8
>>>>
>>>>> 2) There is another bug in 7.0
>>>>>
>>>>> I would hope that it wouldn't take that long to debug this from the 
>>>>> smashed object messages?  It should be fairly easy to 
>>>>
>>>> determine where
>>>>
>>>>> objects in the vicinity were allocated.  If the problem is 
>>>>
>>>> repeatable
>>>>
>>>>> enough, a watchpoint on the overwritten location might even work.
>>>>>
>>>> I've not made much headway with this, perhaps someone could point me 
>>>> in the direction to go.
>>>>
>>>> Basically what I have found is that at some point our program calls 
>>>> GC_MALLOC (gc_debug_malloc - as we are using a debug build). This 
>>>> call returns successfully (no smashed objects). However the very 
>>>> next allocation causes the GC to spit out the smashed object 
>>>> warning. Here is a GDB session to give an example:
>>>>
>>>> wsi_malloc (pSize=78) at src/wsimemory.c:47
>>>> 47          void *mem = GC_MALLOC(pSize);
>>>> (gdb) s
>>>> GC_debug_malloc (lb=78, s=0x400535c9 "src/wsimemory.c", i=47) at 
>>>> dbg_mlc.c:457
>>>> 457     {
>>>> (gdb) n
>>>> 458         void * result = GC_malloc(lb + DEBUG_BYTES);
>>>> (gdb) 460         if (result == 0) {
>>>> (gdb) 458         void * result = GC_malloc(lb + DEBUG_BYTES);
>>>> (gdb) 460         if (result == 0) {
>>>> (gdb) 467         if (!GC_debugging_started) {
>>>> (gdb) 471         return (GC_store_debug_info(result, (word)lb, s, 
>>>> (word)i));
>>>> (gdb) print result
>>>> $1 = (void *) 0x809df70
>>>> (gdb) n
>>>> 472     }
>>>> (gdb)
>>>> wsi_malloc (pSize=78) at src/wsimemory.c:50
>>>> 50          memset(mem, 0, pSize);
>>>> (gdb) print mem
>>>> $2 = (void *) 0x809df80
>>>> (gdb) call GC_debug_malloc(64, "myfile.c", 1)
>>>> GC_check_heap_block: found smashed heap objects:
>>>> 0x80d1008 in object at 0x80d1000(<smashed>, appr. sz = 197)
>>>> $3 = (void *) 0x80a9f88
>>>> (gdb)
>>>> You can see above that we call GC_debug_malloc, this returns. Now 
>>>> before I executed line 50 (the memset call) I had GDB 'call 
>>>> GC_debug_malloc' directly and it detects the smashed object.
>>>>
>>>> You can see the smashed object is at 0x80d1000 which is consistent, 
>>>> but I can not find a place where this address is returned to my 
>>>> application (I set breaks in all the GC allocate functions I could 
>>>> and nothing returned seemed to be in that range). As an aside to the 
>>>> previous sentence, our application makes a lot of allocation calls, 
>>>> so while I set the break points, I didn't enable them for a while. I 
>>>> tried setting conditional breaks in GC_debug_malloc (and others) for 
>>>> when result==0x80d1000, but they never got hit.
>>>>
>>>> Setting a watch point on those memory locations cause the program to 
>>>> crawl (or hang - I couldn't determine). My test machine is a Celeron 
>>>> and GDB does not use a hardware watch when I use the watch command 
>>>> on the memory address.
>>>>
>>>> Any ideas on what else I can do to help determine if this is in our 
>>>> app (which is my suspicion) or an issue with GC 7.0?
>>>>
>>>> Thanks!
>>>> -Jim
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Gc mailing list
>>> Gc at linux.hpl.hp.com
>>> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
>>>
>>>
>>>
>>>
>>
>> -- 
>> Jim Marshall
>> Sr. Staff Engineer
>> WBEM Solutions, Inc.
>> 978-947-3607
>>
>> _______________________________________________
>> Gc mailing list
>> Gc at linux.hpl.hp.com
>> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
>>
> _______________________________________________
> Gc mailing list
> Gc at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
> 
> 
> 


-- 
Alec Orr
Staff Engineer
http://wbemsolutions.com
978-947-3609


More information about the Gc mailing list