[Gc] GC 6.4 simplified leak detection breaks on SuSE Linux9.3i386(glibc 2.3.4)

Matthias Andree matthias.andree at gmx.de
Mon May 16 19:16:14 PDT 2005


"Boehm, Hans" <hans.boehm at hp.com> writes:

> I'm sorry.

Nevermind.

> Unfortunately I had missed that in your original reply.
> I'm having trouble reproducing the problem here.  But that's not
> surprising, since I don't have a box running Suse.

It's not SUSE specific, but a flaw in glibc 2.3.4 or 2.3.5. So you'd
need a box running GNU glibc 2.3.4 or 2.3.5 that didn't have its
sysdeps/i386/backtrace.c patched to the 2.3.3 state.

glibc 2.3.3 and older didn't exhibit this problem.

> I believe the "exclusion ranges overlap" issue is completely different,
> though I'm not sure without tracking it down.  Could you look at
> what the arguments to the failing call of GC_exclude_static_roots are,
> where it's being called from, and what the prior contents of
> GC_excl_table
> are?  GC_excl_table is a sorted table of address ranges.
> GC_exclude_static_roots should not be called on overlapping address
> ranges,
> and inserts those ranges into this table.

OK, here we go: Before 1st call, GC_excl_table_entries is 0, the table
is empty and this is the backtrace:

#0  GC_exclude_static_roots (start=0x805b900, finish=0x8067994) at mark_rts.c:428
#1  0x0804c517 in GC_init_inner () at misc.c:650
#2  0x08048ff7 in GC_generic_malloc_inner (lb=101, k=1) at malloc.c:123
#3  0x0804914b in GC_generic_malloc (lb=101, k=1) at malloc.c:192
#4  0x08049315 in GC_malloc (lb=101) at malloc.c:297
#5  0x080528af in GC_debug_malloc (lb=42, s=0x8056b30 "unknown", i=0) at dbg_mlc.c:490
#6  0x08049386 in malloc (lb=42) at malloc.c:349
#7  0x08048d62 in main () at /home/emma/mywork/bogofilter/src/tests/leakmem.c:16

after exit of 1st call until before entry of 2nd call, table contents are:

$2 = {{e_start = 0x805b900 "", e_end = 0x8067994 ""}, {e_start = 0x0, e_end = 0x0} <repeats 63 times>}

2nd call:

#0  GC_exclude_static_roots (start=0x805b1e0, finish=0x805b320) at mark_rts.c:428
#1  0x0804c52c in GC_init_inner () at misc.c:651
#2  0x08048ff7 in GC_generic_malloc_inner (lb=101, k=1) at malloc.c:123
#3  0x0804914b in GC_generic_malloc (lb=101, k=1) at malloc.c:192
#4  0x08049315 in GC_malloc (lb=101) at malloc.c:297
#5  0x080528af in GC_debug_malloc (lb=42, s=0x8056b30 "unknown", i=0) at dbg_mlc.c:490
#6  0x08049386 in malloc (lb=42) at malloc.c:349
#7  0x08048d62 in main () at /home/emma/mywork/bogofilter/src/tests/leakmem.c:16

Table then is (and remains until entry to 3rd call):

$4 = {{e_start = 0x805b1e0 "XXXXX", e_end = 0x805b320 "\004"}, {e_start = 0x805b900 "", e_end = 0x8067994 ""}, {e_start = 0x0,
    e_end = 0x0} <repeats 62 times>}

Then we have something that is trying to re-exclude the range from the
first call. More info below the trace.

#0  GC_exclude_static_roots (start=0x805b900, finish=0x8067994) at mark_rts.c:428
#1  0x0804c517 in GC_init_inner () at misc.c:650
#2  0x08048ff7 in GC_generic_malloc_inner (lb=661, k=1) at malloc.c:123
#3  0x0804914b in GC_generic_malloc (lb=661, k=1) at malloc.c:192
#4  0x08049315 in GC_malloc (lb=661) at malloc.c:297
#5  0x080528af in GC_debug_malloc (lb=602, s=0x8056b30 "unknown", i=0) at dbg_mlc.c:490
#6  0x080493a8 in calloc (n=602, lb=1) at malloc.c:359
#7  0xb7ff2b61 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2
#8  0xb7fef42e in ?? () from /lib/ld-linux.so.2
#9  0x00000000 in ?? ()
#10 0x90000001 in ?? ()
#11 0x00000000 in ?? ()
#12 0x00000001 in ?? ()
#13 0xb7fe33a5 in ?? ()
#14 0x08048dee in GC_alloc_reclaim_list (kind=0x8078fd8) at malloc.c:31
#15 0xb7ff1051 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2
#16 0xb7f9323e in _dl_open () from /lib/tls/libc.so.6
#17 0xb7ff7186 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2
#18 0xb7f92be0 in _dl_open () from /lib/tls/libc.so.6
#19 0xb7f94d4d in __libc_dlopen_mode () from /lib/tls/libc.so.6
#20 0xb7ff7186 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2
#21 0xb7f94bd5 in _dl_mcount_wrapper () from /lib/tls/libc.so.6
#22 0xb7f94cfb in __libc_dlopen_mode () from /lib/tls/libc.so.6
#23 0xb7f723ba in __nss_passwd_lookup () from /lib/tls/libc.so.6
#24 0xb7f72557 in backtrace () from /lib/tls/libc.so.6
#25 0x0804ddfd in GC_save_callers (info=0x8067974) at os_dep.c:4017
#26 0x08050a9f in GC_try_to_collect_inner (stop_func=0x80505b0 <GC_never_stop_func>) at alloc.c:363
#27 0x0804c6dd in GC_init_inner () at misc.c:782
#28 0x08048ff7 in GC_generic_malloc_inner (lb=101, k=1) at malloc.c:123
#29 0x0804914b in GC_generic_malloc (lb=101, k=1) at malloc.c:192
#30 0x08049315 in GC_malloc (lb=101) at malloc.c:297
#31 0x080528af in GC_debug_malloc (lb=42, s=0x8056b30 "unknown", i=0) at dbg_mlc.c:490
#32 0x08049386 in malloc (lb=42) at malloc.c:349
#33 0x08048d62 in main () at /home/emma/mywork/bogofilter/src/tests/leakmem.c:16

I do not know or understand what it means if the code tries to exclude
the same range again. Threads? I'm using an Athlon XP 2500+ model 10
stepping 0 processor (i686, "Barton" core w/ 512 kB Cache) in case that
matters.

Anyways, you guessed what happens if I issue "finish" to GDB: 
right, raise SIGABRT, drop core and die here:
...
#4  0xb7ece2c1 in raise () from /lib/tls/libc.so.6
#5  0xb7ecfb75 in abort () from /lib/tls/libc.so.6
#6  0x0804c96c in GC_abort (msg=0x8056cd3 "exclusion ranges overlap") at misc.c:1074
#7  0x0804baf8 in GC_exclude_static_roots (start=0x805b900, finish=0x8067994) at mark_rts.c:436
...

The additional information I promised, the tail of the strace of the
code except syscalls that returned ENOENT:

brk(0)                                  = 0x8068000
brk(0x8078000)                          = 0x8078000
brk(0x8088000)                          = 0x8088000
brk(0x8098000)                          = 0x8098000
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=170125, ...}) = 0
old_mmap(NULL, 170125, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7fc0000
close(3)                                = 0
open("/lib/libgcc_s.so.1", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\22"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=31852, ...}) = 0
write(2, "exclusion ranges overlap\n", 25) = 25
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
gettid()                                = 18769
tgkill(18769, 18769, SIGABRT)           = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT +++

And confirmed by the glibc 2.3.4 source (this is from sysdeps/i386/backtrace.c):

...
static _Unwind_Reason_Code (*unwind_backtrace) (_Unwind_Trace_Fn, void *);
static _Unwind_Ptr (*unwind_getip) (struct _Unwind_Context *);
static _Unwind_Ptr (*unwind_getcfa) (struct _Unwind_Context *);
static _Unwind_Ptr (*unwind_getgr) (struct _Unwind_Context *, int);

static void
init (void)
{
  void *handle = __libc_dlopen ("libgcc_s.so.1");

  if (handle == NULL)
    return;

  unwind_backtrace = __libc_dlsym (handle, "_Unwind_Backtrace");
  unwind_getip = __libc_dlsym (handle, "_Unwind_GetIP");
  unwind_getcfa = __libc_dlsym (handle, "_Unwind_GetCFA");
  unwind_getgr = __libc_dlsym (handle, "_Unwind_GetGR");
  if (unwind_getip == NULL || unwind_getgr == NULL || unwind_getcfa == NULL)
    unwind_backtrace = NULL;
}
...

So libc.so runs dlopen() to read libgcc_s.so.1 - and this fails as just
shown.

glibc 2.3.3 didn't dlopen libgcc_s. The corresponding ChangeLog entry
for 2.3.4 is:

2004-10-22  Jakub Jelinek  <jakub at redhat.com>

        * sysdeps/i386/Makefile (CFLAGS-backtrace.c): Add -fexceptions.
        * sysdeps/i386/backtrace.c: Include <bits/libc-lock.h>, <dlfcn.h>,
        <stdlib.h> and <unwind.h>.  Remove <bp-checks.h> include.
        (struct trace_arg): New type.
        (unwind_backtrace, unwind_getip, unwind_getcfa, unwind_getgr): New
        fn pointers resp. macros.
        (init, backtrace_helper): New functions.
        (__backtrace): Rewritten to use _Unwind_Backtrace first and fall
        back to frame pointer walking.

I've filed a bug against GNU libc since this libc behavior contradicts
the documentation which states backtrace() wasn't calling malloc() but
using stack allocated memory. This is false, as proven.

See <http://sources.redhat.com/bugzilla/show_bug.cgi?id=956>, but as
several glibc bugs I have reported so far have been ignored even if
critical and causing nondeterministic behavior, I'm not expecting any
help from there.

> You probably also have to be careful to do a complete rebuild if
> you reconfigure the collector.  The size of some critical internal data
> structures depend on the configuration.

I've always built in a fresh directory or after "make clean", so that's
probably not it. BTW, it doesn't matter if I use libgc.so (forced the
right version with or libgc.a, same results.

-- 
Matthias Andree


More information about the Gc mailing list