[Gc] moving from 6.8->7.0 getting smashed objects

Boehm, Hans hans.boehm at hp.com
Mon Jul 23 16:01:06 PDT 2007


This is the only thread running at this point?  Is there another thread
that could be running in the interim and corrupting the heap?

This is built with DBG_HDRS_ALL?

It looks to me like the GC_check_heap_block message should be
impossible.  It should print the start of the user-visible object. I
don't think an address ending in 000 is likely to qualify.  (That might
also explain why you never see it being returned.)  Looking at the code,
I also no longer believe this is being printed correctly.  I suspect
that this is not the main problem, but I will see if I can generate a
test case to reproduce this, and investigate.  Hopefully a patch is
forthcoming ...

I believe you are reading way too much into the precise point at which
the message appears.  The heap is scanned for overwrite errors during
each GC.  It is inconvenient to print anything there, so that's
postponed until the next GC_print_all_errors call, which happens during
some, but not all, allocations.

You probably want to set a breakpoint in GC_check_heap_proc(), and look
at GC_n_smashed afterwards.  The GC_smashed[] array should contain
pointers to the locations that the GC thought were clobbered.  I suspect
it's safe to invoke GC_check_heap_proc from a debugger to narrow down
the point at which things go south.

Hans

> -----Original Message-----
> From: jim marshall [mailto:jim.marshall at wbemsolutions.com] 
> Sent: Thursday, July 19, 2007 10:15 PM
> To: Boehm, Hans
> Cc: gc at napali.hpl.hp.com
> Subject: Re: [Gc] moving from 6.8->7.0 getting smashed objects
> 
> Boehm, Hans wrote:
> > I think the algorithm for detecting smashed objects has not changed.
> > Lots of other things, including object placement, no doubt 
> have.  It's 
> > quite conceivable that either:
> >
> > 1) Other changes cause the overwrite to be noticed in 7.0 
> but not 6.8
> > 2) There is another bug in 7.0
> >
> > I would hope that it wouldn't take that long to debug this from the 
> > smashed object messages?  It should be fairly easy to 
> determine where 
> > objects in the vicinity were allocated.  If the problem is 
> repeatable 
> > enough, a watchpoint on the overwritten location might even work.
> >   
> I've not made much headway with this, perhaps someone could 
> point me in the direction to go.
> 
> Basically what I have found is that at some point our program 
> calls GC_MALLOC (gc_debug_malloc - as we are using a debug 
> build). This call returns successfully (no smashed objects). 
> However the very next allocation causes the GC to spit out 
> the smashed object warning. Here is a GDB session to give an example:
> 
> wsi_malloc (pSize=78) at src/wsimemory.c:47
> 47          void *mem = GC_MALLOC(pSize);
> (gdb) s
> GC_debug_malloc (lb=78, s=0x400535c9 "src/wsimemory.c", i=47) 
> at dbg_mlc.c:457
> 457     {
> (gdb) n
> 458         void * result = GC_malloc(lb + DEBUG_BYTES);
> (gdb) 
> 460         if (result == 0) {
> (gdb) 
> 458         void * result = GC_malloc(lb + DEBUG_BYTES);
> (gdb) 
> 460         if (result == 0) {
> (gdb) 
> 467         if (!GC_debugging_started) {
> (gdb) 
> 471         return (GC_store_debug_info(result, (word)lb, s, 
> (word)i));
> (gdb) print result
> $1 = (void *) 0x809df70
> (gdb) n
> 472     }
> (gdb)
> wsi_malloc (pSize=78) at src/wsimemory.c:50
> 50          memset(mem, 0, pSize);
> (gdb) print mem
> $2 = (void *) 0x809df80
> (gdb) call GC_debug_malloc(64, "myfile.c", 1)
> GC_check_heap_block: found smashed heap objects:
> 0x80d1008 in object at 0x80d1000(<smashed>, appr. sz = 197)
> $3 = (void *) 0x80a9f88
> (gdb) 
> 
> 
> You can see above that we call GC_debug_malloc, this returns. 
> Now before 
> I executed line 50 (the memset call) I had GDB 'call GC_debug_malloc' 
> directly and it detects the smashed object.
> 
> You can see the smashed object is at 0x80d1000 which is 
> consistent, but 
> I can not find a place where this address is returned to my 
> application 
> (I set breaks in all the GC allocate functions I could and nothing 
> returned seemed to be in that range). As an aside to the previous 
> sentence, our application makes a lot of allocation calls, so while I 
> set the break points, I didn't enable them for a while. I 
> tried setting 
> conditional breaks in GC_debug_malloc (and others) for when 
> result==0x80d1000, but they never got hit.
> 
> Setting a watch point on those memory locations cause the program to 
> crawl (or hang - I couldn't determine). My test machine is a 
> Celeron and 
> GDB does not use a hardware watch when I use the watch command on the 
> memory address.
> 
> Any ideas on what else I can do to help determine if this is 
> in our app 
> (which is my suspicion) or an issue with GC 7.0?
> 
> Thanks!
> -Jim
> 



More information about the Gc mailing list