[Gc] GC 6.4 simplified leak detection breaks on SuSE Linux9.3i386(glibc 2.3.4)

Hans Boehm Hans.Boehm at hp.com
Mon May 16 23:00:47 PDT 2005


It looks like the GC initialization routine is being called recursively,
because the initialization routine forces a GC (around line 782, misc.c)
just before setting the GC_is_initialized flag.  That unfortunately
again generates a backtrace, which allocates, which invokes the GC,
which restarts the initialization.

Can you try commenting out the three calls to GC_save_callers in
alloc.c?  They're really only needed for GC debugging, I believe.
If that solves the problem, we can conditionally take them out.
(I'm pretty sure this will help.  I'm less sure this is the last
such problem.)

Hans

On Tue, 17 May 2005, Matthias Andree wrote:

> "Boehm, Hans" <hans.boehm at hp.com> writes:
>
> > I'm sorry.
>
> Nevermind.
>
> > Unfortunately I had missed that in your original reply.
> > I'm having trouble reproducing the problem here.  But that's not
> > surprising, since I don't have a box running Suse.
>
> It's not SUSE specific, but a flaw in glibc 2.3.4 or 2.3.5. So you'd
> need a box running GNU glibc 2.3.4 or 2.3.5 that didn't have its
> sysdeps/i386/backtrace.c patched to the 2.3.3 state.
>
> glibc 2.3.3 and older didn't exhibit this problem.
>
> > I believe the "exclusion ranges overlap" issue is completely different,
> > though I'm not sure without tracking it down.  Could you look at
> > what the arguments to the failing call of GC_exclude_static_roots are,
> > where it's being called from, and what the prior contents of
> > GC_excl_table
> > are?  GC_excl_table is a sorted table of address ranges.
> > GC_exclude_static_roots should not be called on overlapping address
> > ranges,
> > and inserts those ranges into this table.
>
> OK, here we go: Before 1st call, GC_excl_table_entries is 0, the table
> is empty and this is the backtrace:
>
> #0  GC_exclude_static_roots (start=0x805b900, finish=0x8067994) at mark_rts.c:428
> #1  0x0804c517 in GC_init_inner () at misc.c:650
> #2  0x08048ff7 in GC_generic_malloc_inner (lb=101, k=1) at malloc.c:123
> #3  0x0804914b in GC_generic_malloc (lb=101, k=1) at malloc.c:192
> #4  0x08049315 in GC_malloc (lb=101) at malloc.c:297
> #5  0x080528af in GC_debug_malloc (lb=42, s=0x8056b30 "unknown", i=0) at dbg_mlc.c:490
> #6  0x08049386 in malloc (lb=42) at malloc.c:349
> #7  0x08048d62 in main () at /home/emma/mywork/bogofilter/src/tests/leakmem.c:16
>
> after exit of 1st call until before entry of 2nd call, table contents are:
>
> $2 = {{e_start = 0x805b900 "", e_end = 0x8067994 ""}, {e_start = 0x0, e_end = 0x0} <repeats 63 times>}
>
> 2nd call:
>
> #0  GC_exclude_static_roots (start=0x805b1e0, finish=0x805b320) at mark_rts.c:428
> #1  0x0804c52c in GC_init_inner () at misc.c:651
> #2  0x08048ff7 in GC_generic_malloc_inner (lb=101, k=1) at malloc.c:123
> #3  0x0804914b in GC_generic_malloc (lb=101, k=1) at malloc.c:192
> #4  0x08049315 in GC_malloc (lb=101) at malloc.c:297
> #5  0x080528af in GC_debug_malloc (lb=42, s=0x8056b30 "unknown", i=0) at dbg_mlc.c:490
> #6  0x08049386 in malloc (lb=42) at malloc.c:349
> #7  0x08048d62 in main () at /home/emma/mywork/bogofilter/src/tests/leakmem.c:16
>
> Table then is (and remains until entry to 3rd call):
>
> $4 = {{e_start = 0x805b1e0 "XXXXX", e_end = 0x805b320 "\004"}, {e_start = 0x805b900 "", e_end = 0x8067994 ""}, {e_start = 0x0,
>     e_end = 0x0} <repeats 62 times>}
>
> Then we have something that is trying to re-exclude the range from the
> first call. More info below the trace.
>
> #0  GC_exclude_static_roots (start=0x805b900, finish=0x8067994) at mark_rts.c:428
> #1  0x0804c517 in GC_init_inner () at misc.c:650
> #2  0x08048ff7 in GC_generic_malloc_inner (lb=661, k=1) at malloc.c:123
> #3  0x0804914b in GC_generic_malloc (lb=661, k=1) at malloc.c:192
> #4  0x08049315 in GC_malloc (lb=661) at malloc.c:297
> #5  0x080528af in GC_debug_malloc (lb=602, s=0x8056b30 "unknown", i=0) at dbg_mlc.c:490
> #6  0x080493a8 in calloc (n=602, lb=1) at malloc.c:359
> #7  0xb7ff2b61 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2
> #8  0xb7fef42e in ?? () from /lib/ld-linux.so.2
> #9  0x00000000 in ?? ()
> #10 0x90000001 in ?? ()
> #11 0x00000000 in ?? ()
> #12 0x00000001 in ?? ()
> #13 0xb7fe33a5 in ?? ()
> #14 0x08048dee in GC_alloc_reclaim_list (kind=0x8078fd8) at malloc.c:31
> #15 0xb7ff1051 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2
> #16 0xb7f9323e in _dl_open () from /lib/tls/libc.so.6
> #17 0xb7ff7186 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2
> #18 0xb7f92be0 in _dl_open () from /lib/tls/libc.so.6
> #19 0xb7f94d4d in __libc_dlopen_mode () from /lib/tls/libc.so.6
> #20 0xb7ff7186 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2
> #21 0xb7f94bd5 in _dl_mcount_wrapper () from /lib/tls/libc.so.6
> #22 0xb7f94cfb in __libc_dlopen_mode () from /lib/tls/libc.so.6
> #23 0xb7f723ba in __nss_passwd_lookup () from /lib/tls/libc.so.6
> #24 0xb7f72557 in backtrace () from /lib/tls/libc.so.6
> #25 0x0804ddfd in GC_save_callers (info=0x8067974) at os_dep.c:4017
> #26 0x08050a9f in GC_try_to_collect_inner (stop_func=0x80505b0 <GC_never_stop_func>) at alloc.c:363
> #27 0x0804c6dd in GC_init_inner () at misc.c:782
> #28 0x08048ff7 in GC_generic_malloc_inner (lb=101, k=1) at malloc.c:123
> #29 0x0804914b in GC_generic_malloc (lb=101, k=1) at malloc.c:192
> #30 0x08049315 in GC_malloc (lb=101) at malloc.c:297
> #31 0x080528af in GC_debug_malloc (lb=42, s=0x8056b30 "unknown", i=0) at dbg_mlc.c:490
> #32 0x08049386 in malloc (lb=42) at malloc.c:349
> #33 0x08048d62 in main () at /home/emma/mywork/bogofilter/src/tests/leakmem.c:16
>
> I do not know or understand what it means if the code tries to exclude
> the same range again. Threads? I'm using an Athlon XP 2500+ model 10
> stepping 0 processor (i686, "Barton" core w/ 512 kB Cache) in case that
> matters.
>
> Anyways, you guessed what happens if I issue "finish" to GDB:
> right, raise SIGABRT, drop core and die here:
> ...
> #4  0xb7ece2c1 in raise () from /lib/tls/libc.so.6
> #5  0xb7ecfb75 in abort () from /lib/tls/libc.so.6
> #6  0x0804c96c in GC_abort (msg=0x8056cd3 "exclusion ranges overlap") at misc.c:1074
> #7  0x0804baf8 in GC_exclude_static_roots (start=0x805b900, finish=0x8067994) at mark_rts.c:436
> ...
>
> The additional information I promised, the tail of the strace of the
> code except syscalls that returned ENOENT:
>
> brk(0)                                  = 0x8068000
> brk(0x8078000)                          = 0x8078000
> brk(0x8088000)                          = 0x8088000
> brk(0x8098000)                          = 0x8098000
> open("/etc/ld.so.cache", O_RDONLY)      = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=170125, ...}) = 0
> old_mmap(NULL, 170125, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7fc0000
> close(3)                                = 0
> open("/lib/libgcc_s.so.1", O_RDONLY)    = 3
> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\22"..., 512) = 512
> fstat64(3, {st_mode=S_IFREG|0755, st_size=31852, ...}) = 0
> write(2, "exclusion ranges overlap\n", 25) = 25
> rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
> gettid()                                = 18769
> tgkill(18769, 18769, SIGABRT)           = 0
> --- SIGABRT (Aborted) @ 0 (0) ---
> +++ killed by SIGABRT +++
>
> And confirmed by the glibc 2.3.4 source (this is from sysdeps/i386/backtrace.c):
>
> ...
> static _Unwind_Reason_Code (*unwind_backtrace) (_Unwind_Trace_Fn, void *);
> static _Unwind_Ptr (*unwind_getip) (struct _Unwind_Context *);
> static _Unwind_Ptr (*unwind_getcfa) (struct _Unwind_Context *);
> static _Unwind_Ptr (*unwind_getgr) (struct _Unwind_Context *, int);
>
> static void
> init (void)
> {
>   void *handle = __libc_dlopen ("libgcc_s.so.1");
>
>   if (handle == NULL)
>     return;
>
>   unwind_backtrace = __libc_dlsym (handle, "_Unwind_Backtrace");
>   unwind_getip = __libc_dlsym (handle, "_Unwind_GetIP");
>   unwind_getcfa = __libc_dlsym (handle, "_Unwind_GetCFA");
>   unwind_getgr = __libc_dlsym (handle, "_Unwind_GetGR");
>   if (unwind_getip == NULL || unwind_getgr == NULL || unwind_getcfa == NULL)
>     unwind_backtrace = NULL;
> }
> ...
>
> So libc.so runs dlopen() to read libgcc_s.so.1 - and this fails as just
> shown.
>
> glibc 2.3.3 didn't dlopen libgcc_s. The corresponding ChangeLog entry
> for 2.3.4 is:
>
> 2004-10-22  Jakub Jelinek  <jakub at redhat.com>
>
>         * sysdeps/i386/Makefile (CFLAGS-backtrace.c): Add -fexceptions.
>         * sysdeps/i386/backtrace.c: Include <bits/libc-lock.h>, <dlfcn.h>,
>         <stdlib.h> and <unwind.h>.  Remove <bp-checks.h> include.
>         (struct trace_arg): New type.
>         (unwind_backtrace, unwind_getip, unwind_getcfa, unwind_getgr): New
>         fn pointers resp. macros.
>         (init, backtrace_helper): New functions.
>         (__backtrace): Rewritten to use _Unwind_Backtrace first and fall
>         back to frame pointer walking.
>
> I've filed a bug against GNU libc since this libc behavior contradicts
> the documentation which states backtrace() wasn't calling malloc() but
> using stack allocated memory. This is false, as proven.
>
> See <http://sources.redhat.com/bugzilla/show_bug.cgi?id=956>, but as
> several glibc bugs I have reported so far have been ignored even if
> critical and causing nondeterministic behavior, I'm not expecting any
> help from there.
>
> > You probably also have to be careful to do a complete rebuild if
> > you reconfigure the collector.  The size of some critical internal data
> > structures depend on the configuration.
>
> I've always built in a fresh directory or after "make clean", so that's
> probably not it. BTW, it doesn't matter if I use libgc.so (forced the
> right version with or libgc.a, same results.
>
> --
> Matthias Andree
>


More information about the Gc mailing list