[Gc] Segfault in GC_mark_from in libgc 7.1 (released tarball)

Boehm, Hans hans.boehm at hp.com
Tue Aug 19 13:00:47 PDT 2008



> -----Original Message-----
> From: ktreichel at web.de [mailto:ktreichel at web.de]
> Sent: Tuesday, August 19, 2008 12:35 AM
> To: Boehm, Hans
> Cc: Bruce Hoult; gc at napali.hpl.hp.com
> Subject: RE: [Gc] Segfault in GC_mark_from in libgc 7.1
> (released tarball)
>
> Am Montag, den 18.08.2008, 20:18 +0000 schrieb Boehm, Hans:
> >
> > > -----Original Message-----
> > > From: gc-bounces at napali.hpl.hp.com
> > > [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of Klaus Treichel
> > > Sent: Sunday, August 17, 2008 4:23 AM
> > > To: Bruce Hoult
> > > Cc: gc at napali.hpl.hp.com
> > > Subject: Re: [Gc] Segfault in GC_mark_from in libgc 7.1 (released
> > > tarball)
> > >
> > > Am Mittwoch, den 13.08.2008, 09:39 +0200 schrieb Klaus Treichel:
> > > > Am Mittwoch, den 13.08.2008, 10:17 +1200 schrieb Bruce Hoult:
> > > > > 2008/8/13 Klaus Treichel <ktreichel at web.de>:
> > > > > > Hi,
> > > > > >
> > > > > > what i found out until now is:
> > > > > >
> > > > > > 1. limit is an inaccessible address
> > > > > > (gdb) print limit
> > > > > > $26 = 0xb55010 <Address 0xb55010 out of bounds>
> > > > > >
> > > > > > where 0xb54fff is accessible.
> > > > > >
> > > > > > 2. limit is in the range between least_ha and
> > > greatest_ha so the
> > > > > > check doesn't prevent the segfault.
> > That check should never be needed to prevent a segfault.
> GC_least_plausible_heap_addr and
> GC_greatest_plausible_heap_addr are used to:
> >
> What are the lines like
>       if ((ptr_t)current >= least_ha && (ptr_t)current <
> greatest_ha) { for?
It's a quick plausibility check that statistically eliminates a large fraction of candidate pointers without going through the more expensive real pointer validation process.  In particular, small integers should always fail this check.  On 64-bit machines, almost all non-pointers should fail.

If a candidate pointer passes this test, and fails a later more precise one, it is tracked as a "near miss", and we avoid allocating memory that would later appear to be referenced by it.

>
> > a) Eliminate obviously implausible "candidate pointers" and
> hence speed up pointer validation.
> > b) Detect "near misses" in support of the blacklisting code.
> >
> > It would be interesting to know what
> GC_find_header(0xb54fff) is.  Does it think this block is in
> the heap?  Or is this ostensibly part of the root set.  If
> it's non-null. i.e. if the block is in the heap, what's
> *GC_find_header(0xb54fff)?
>
> The block is in the GC heap.
Sort of. hb_flags = 4 means it's a free block.

>
> ***Heap sections:
> Total heap size: 958464
> Section 0 from 0x7ad000 to 0x7bd000 0/16 blacklisted Section
> 1 from 0x7bd000 to 0x7e3000 0/38 blacklisted Section 2 from
> 0x944000 to 0x957000 0/19 blacklisted Section 3 from 0x957000
> to 0x970000 0/25 blacklisted Section 4 from 0x980000 to
> 0x9a1000 0/33 blacklisted Section 5 from 0xaee000 to 0xb1a000
> 0/44 blacklisted Section 6 from 0xb1a000 to 0xb55000 0/59 blacklisted
>
> (gdb) print *GC_find_header(0xb50000)
> $8 = {hb_next = 0xb18000, hb_prev = 0x0, hb_block = 0x7b2000,
>   hb_obj_kind = 2 '\002', hb_flags = 4 '\004', hb_last_reclaimed = 1,
>   hb_sz = 20480, hb_descr = 131088, hb_large_block = 1 '\001',
>   hb_map = 0x798550, hb_n_marks = 1, hb_marks = {1, 0, 0, 0, 1}}
>
> For locations >= 0xb51000 GC_find_header returns 0.
>
> Setting GC_no_dls to 1 before initializing the GC causes the
> segfault to dissapear.
I'm concerned that this perturbs things just enough for the problem to disappear, without actually removong the cause.  Can you call GC_dump() or GC_print_static_roots() at this point, and see whether a block around that region is registered as part of the root set?  It might also be worth looking at /proc/<pid>/maps to see if there are any other mappings close to there, though it looks like there aren't.
>
> So it looks like the false object is pushed on the mark stack
> during marking the static roots.
Maybe.  Another possibility is a generic bug that sometimes causes the collector to fail to notice that the heap block is free, if it happens to find a false reference to it.  I'll stare at the code a bit to explore that possibility.  The hb_descr value here is suggestive, and has me worried.

Hans

>
> (gdb) print mark_stack_top[0]
> $11 = {mse_start = 0xb55358 <Address 0xb55358 out of bounds>,
>   mse_descr = 109752}
>
> (gdb) print mark_stack_top[-1]
> $12 = {mse_start = 0xb2f000 "P\205\230", mse_descr = 131088}
>
> Klaus
>



More information about the Gc mailing list