[Gc] a problem with the collector with multi-threaded applications

Boehm, Hans hans.boehm at hp.com
Mon Apr 18 16:32:24 PDT 2005


This is X86/Linux?  Do you build with THREAD_LOCAL_ALLOC
defined?

Can you look at the instruction that's failing?

I replaced the GC_start/end_blocking interface in the 7.0 tree
since I think there is a subtle race here.  If the GC happens
just before the blocking system call, some callee-saves registers
may not have been saved yet.  Thus the GC can miss them, and
they could conceivably contains pointers from the caller.
Saving the context in the caller (with getcontext or
__builtin_unwind_init) should work around the problem.

But that seems extremely unlikely for 32-bit X86 code.  And
the symptoms don't look like that's happening.  In fact, I have
no idea what could be generating the SIGILL.  A disassembly of the
code around 0xb7d8056c might suggest something.

Hans

> -----Original Message-----
> From: gc-bounces at napali.hpl.hp.com 
> [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of Manuel Serrano
> Sent: Wednesday, April 13, 2005 5:33 AM
> To: gc at napali.hpl.hp.com
> Subject: [Gc] a problem with the collector with 
> multi-threaded applications
> 
> 
> Hello Everybody,
> 
> I'm struggling with a bug in an multi-threaded application 
> and I'm now 
> totally lacking ideas on how to fight against it. Thus I'm 
> coming to you for some advices or ideas.
> 
> My application is an HTTP server it is programmed in Scheme 
> and the runtime system uses the GC6.4 version. It runs under 
> Linux (2.6.11). I'm using 
> gcc3.3.5 for compiling C code.
> 
> >From time to time, the bug shows up. It is always in the 
> same piece of
> code:
> 
>    GC_start_blocking();
>    new_s = (int)accept( fd, &sin, (socklen_t *)&len );
>    GC_stop_blocking();
>    
> 
> 
> -----|-----|-----|-----|-----|-----|-----|-----|-----|-----|--
> ---|-----|-----
> Program received signal SIGILL, Illegal instruction.
> [Switching to Thread 16384 (LWP 23569)]
> 0xb7d8056c in GC_end_blocking () at pthread_support.c:1007
> 1007        LOCK();   /* This will block if the world is 
> stopped.       */
> (gdb) bt
> #0  0xb7d8056c in GC_end_blocking () at 
> pthread_support.c:1007 #1  0xb7f9295a in socket_accept 
> (serv=0x83ae190, bufferedp=1 '\001', errp=1)
>     at Clib/csocket.c:457
> #2  0x0805a5df in BGl_handlezd2connectionzd2zzmainz00 (
>     BgL_acceptzd2poolzd2_2=0x83be1c0, 
> BgL_replyzd2poolzd2_3=0x83be1a0, 
>     BgL_sz00_4=0x83ae190, BgL_nz00_5=585) at main.c:976
> #3  0x0805a46c in 
> BGl_zc3anonymousza31697za3mainze2scmza356ze3z61zzmainz00 (
>     BgL_envz00_596=0x83be180) at main.c:858
> #4  0xb7f1951a in BGl_zc3exitza31393ze3z83zz__errorz00 (
>     BgL_thunkz00_2583=0x83be180) at 
> objs/obj_s/Llib/error.c:709 #5  0xb7f19364 in 
> BGl_withzd2exceptionzd2handlerz00zz__errorz00 (
>     BgL_handlerz00_9=0x83c4408, BgL_thunkz00_10=0x83be180)
>     at objs/obj_s/Llib/error.c:661
> #6  0x0805a305 in 
> BGl_zc3exitza31686za3mainze2scmza355ze3z61zzmainz00 (
>     BgL_sz00_669=0x83ae190, BgL_rpz00_668=0x83be1a0, 
> BgL_apz00_667=0x83be1c0)
>     at main.c:775
> #7  0x0805a212 in 
> BGl_zc3anonymousza31684za3mainze2scmza346ze3z61zzmainz00 (
>     BgL_envz00_593=0x80c8a00) at main.c:730
> #8  0xb7f1951a in BGl_zc3exitza31393ze3z83zz__errorz00 (
>     BgL_thunkz00_2583=0x80c8a00) at 
> objs/obj_s/Llib/error.c:709 #9  0xb7f19364 in 
> BGl_withzd2exceptionzd2handlerz00zz__errorz00 (
>     BgL_handlerz00_9=0x80c89e0, BgL_thunkz00_10=0x80c8a00)
>     at objs/obj_s/Llib/error.c:661
> #10 0x0805a13a in BGl_mainz00zzmainz00 
> (BgL_argsz00_1=0x8144feb) at main.c:668 #11 0x08059f8f in 
> bigloo_main (BgL_argvz00_750=0x8144feb) at main.c:563 #12 
> 0xb7f89bdf in _bigloo_main (argc=2, argv=0xbfffcf94, envp=0xbfffcfa0, 
>     bigloo_main=0x8059f68 <bigloo_main>) at Clib/cmain.c:167 
> #13 0x08059ee6 in main (argc=2, argv=0xbfffcf94, 
> env=0xbfffcfa0) at main.c:519
> (gdb) info thread
>   22 Thread 327701 (LWP 23605)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   21 Thread 311316 (LWP 23604)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   20 Thread 294931 (LWP 23601)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   19 Thread 278546 (LWP 23600)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   18 Thread 262161 (LWP 23599)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   17 Thread 245776 (LWP 23597)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   16 Thread 229391 (LWP 23596)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   15 Thread 213006 (LWP 23595)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   14 Thread 196621 (LWP 23594)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   13 Thread 180236 (LWP 23593)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   12 Thread 163851 (LWP 23592)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   11 Thread 147466 (LWP 23591)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   10 Thread 131081 (LWP 23590)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   9 Thread 114696 (LWP 23589)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   8 Thread 98311 (LWP 23588)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   7 Thread 81926 (LWP 23587)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   6 Thread 65541 (LWP 23586)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   5 Thread 49156 (LWP 23585)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   4 Thread 32771 (LWP 23584)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   3 Thread 16386 (LWP 23583)  0xb7ce7fa4 in __pthread_sigsuspend ()
>    from /lib/libpthread.so.0
>   2 Thread 32769 (LWP 23582)  0xb7c7580a in poll () from 
> /lib/libc.so.6
> * 1 Thread 16384 (LWP 23569)  0xb7d8056c in GC_end_blocking ()
>     at pthread_support.c:1007
> -----|-----|-----|-----|-----|-----|-----|-----|-----|-----|--
> ---|-----|-----
> 
> At this point, I have no idea in which direction I should 
> search the error. Is there anybody here, with a suggestion on 
> how I should handle this problem? Many thanks in advance.
> 
> -- 
> Manuel
> _______________________________________________
> Gc mailing list
> Gc at linux.hpl.hp.com 
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
> 



More information about the Gc mailing list