[Gc] Re: [PATCH] Dealing with `.data.rel.ro'

Petter Urkedal urkedal at nbi.dk
Mon May 25 00:04:45 PDT 2009


On 2009-05-24, Boehm, Hans wrote:
> Thanks.
> 
> Unfortunately, I haven't been immediately able to reproduce this on a RedHat 5.1 machine with gcc 4.1.2.  Can others reproduce this?

My machine runs Gentoo stable with glibc-2.8_p20080602-r1, gcc-4.3.2-r3,
binutils-2.18-r3 (the -r* are from the distro).

> I assume that 0x7f9ba26eb9b8 (top -> mse_start) is a completely bogus address?

On some runs it's in a valid range, on others I get 

$1 = {mse_start = 0xd00000010 <Address 0xd00000010 out of bounds>,
mse_descr = 140571484270768}

> mse->descr appears to describe the length of an object (lsbs zero), but is way too big.  It looks like somehow a completely bogus entry worked itself onto the mark stack.  It would be good to check load_segs and n_load_segs in dyn_load.c and see whether that contains any bogus entries.  Each entry should describe either one or two valid address ranges.  (In the case of one, the second one should be all zeroes.   Other empty ranges can occur and are OK.)

It seems to me that the two first entries of load_segs are wrong:

(gdb) p n_load_segs
$4 = 8
(gdb) p/x load_segs
$6 = {{start = 0x600d68, end = 0x600d68, start2 = 0x601000, end2 = 0x601028}, {
    start = 0xd0000000a, end = 0x1c5871e50b973a16, start2 = 0x0, end2 = 0x0}, {
    start = 0x7f76208c5e58, end = 0x7f76208c5e58, start2 = 0x7f76208c7000, 
    end2 = 0x7f76208f9520}, {start = 0x7f76206929d8, end = 0x7f76206929d8, 
    start2 = 0x7f7620693000, end2 = 0x7f7620697150}, {start = 0x7f762047bcd0, 
    end = 0x7f762047bcd0, start2 = 0x7f762047c000, end2 = 0x7f762047c0a0}, {
    start = 0x7f762026f720, end = 0x7f762026f720, start2 = 0x7f7620273000, 
    end2 = 0x7f76202783f8}, {start = 0x7f761ff2ec68, end = 0x7f761ff2ec68, 
    start2 = 0x7f761ff2f000, end2 = 0x7f761ff2f298}, {start = 0x7f7620b15ba0, 
    end = 0x7f7620b15ba0, start2 = 0x7f7620b16000, end2 = 0x7f7620b16c68}, {start = 0x0, 
    end = 0x0, start2 = 0x0, end2 = 0x0} <repeats 2040 times>}

The values seem to be consistent between runs.

> If those look OK, could you try explicitly undefining PT_GNU_RELRO before it is first tested?  I think that should essentially get rid of all the newly added code there.  It would be good to confirm that this patch is actually the cause of the problem.

Undefining PT_GNU_RELRO avoids the problem. 

> If the problem is deterministic, another good plan of attack might be to remember the value of top at the failure point, and then rerun with a watchpoint there.

The failure is completely reproducible, but the values in *top changes.
I'll take a better look at it later today.

Petter


More information about the Gc mailing list