[Gc] Re: GC 6.4 vs Irix w/ threads

Dan Bonachea bonachea at cs.berkeley.edu
Sun Apr 17 08:30:01 PDT 2005


At 08:04 AM 4/12/2005, Hans Boehm wrote:
>I think this patch helps for the Irix 64-bit case.  This just seems
>to be a case of generating a SIGBUS where a SIGSEGV was expected.
>I tested only very superficially.
>
>I have no idea whether this helps the MPI problem.  Unfortunately,
>I also don't have a way to test.

Thanks Hans - with the addition of this patch, both 32 and 64-bit IRIX gctest 
appear to be working properly.

However, I'm still having trouble in programs mixing the GC with MPI on IRIX. 
When collection is enabled (GC_dont_gc == 0), all MPI programs using the GC 
crash with:

MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 10

and writes entries to /var/adm/SYSLOG like:

Apr 17 06:47:39 4A:lou unix: |$(0xb6b)WARNING: 
/hw/module/001c21/node/cpubus/0/b: Uncached Partial Read Error on MSPEC 
access, physaddr 0x844dcc3f0, process [arrayCopyTest] pid 1324873
Apr 17 06:47:39 5A:lou unix: |$(0xb5c)NOTICE: 
/hw/module/001c21/node/cpubus/0/b: User Data Bus error in Mspec space at 
physical address 0x844dcc3f0 /hw/module/001c21/node/memory/dimm_bank/1 (EPC 
0x1025ecc0)

here's a crash stack:

 >  0 GC_mark_from(mark_stack_top = 0x10808040, mark_stack = 0x10808000, 
mark_stack_limit = 0x10810000) 
["/home/ece/bonachea/Ti/src/runtime/gc/mark.c":769, 0x10389674]
    1 GC_mark_some(cold_gc_frame = 0x7fff2b38 = "") 
["/home/ece/bonachea/Ti/src/runtime/gc/mark.c":361, 0x103889d0]
    2 GC_stopped_mark(stop_func = 0x103843a0) 
["/home/ece/bonachea/Ti/src/runtime/gc/alloc.c":519, 0x10385268]
    3 GC_try_to_collect_inner(stop_func = 0x103843a0) 
["/home/ece/bonachea/Ti/src/runtime/gc/alloc.c":366, 0x10384c90]
    4 GC_init_inner() ["/home/ece/bonachea/Ti/src/runtime/gc/misc.c":782, 
0x103832f4]
    5 GC_generic_malloc_inner(lb = 1, k = 1) 
["/home/ece/bonachea/Ti/src/runtime/gc/malloc.c":123, 0x1038c954]
    6 GC_generic_malloc(lb = 1, k = 1) 
["/home/ece/bonachea/Ti/src/runtime/gc/malloc.c":192, 0x1038cc14]
    7 GC_malloc(lb = 1) ["/home/ece/bonachea/Ti/src/runtime/gc/malloc.c":297, 
0x1038d2d8]
    8 real_main(argc = 1, argv = 0x7fff2ec4, envp = 0x7fff2ecc) 
["/home/ece/bonachea/Ti/src/runtime/backend/mpi-cluster-smp/main.c":887, 
0x1036b3e4]
    9 main(argc = 1, argv = 0x7fff2ec4, envp = 0x7fff2ecc) 
["/home/ece/bonachea/Ti/src/runtime/backend/mpi-cluster-smp/main.c":864, 
0x1036b2b4]
More (n if no)?
    10 __start() 
["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 
0x1004e3d

non-MPI programs and MPI programs with collection disabled (GC_dont_gc == 1) 
all appear to work properly.

>I have no idea whether this helps the MPI problem.  Unfortunately,
>I also don't have a way to test.

Hans - here's instructions on how you can reproduce the problem on lou:

Compile with the following command:

$ ~bonachea/.tc-dist/dist-debug/bin/tcbuild -v --keep --backend 
mpi-cluster-uniprocess ~bonachea/.tc-dist/arrayCopyTest.ti

this will build a Titanium/MPI program for you called arrayCopyTest, which 
links the GC.
Run the program with:

$ mpirun -np 2 ./arrayCopyTest

You can disable collection by setting environment variable TI_NOGC before 
running. If you want to try a new GC lib, you can link it in by copy&pasting 
the final link line from tcbuild and replacing -lgc-uniproc with the path to 
your own libgc.a.

Dan


>Hans
>
>--- os_dep.c.orig       Tue Apr 12 04:22:53 2005
>+++ os_dep.c    Tue Apr 12 06:32:59 2005
>@@ -698,7 +698,7 @@
>  #   if defined(SUNOS5SIGS) || defined(IRIX5) || defined(OSF1) \
>      || defined(HURD) || defined(NETBSD)
>         static struct sigaction old_segv_act;
>-#      if defined(_sigargs) /* !Irix6.x */ || defined(HPUX) \
>+#      if defined(IRIX5) || defined(HPUX) \
>         || defined(HURD) || defined(NETBSD)
>             static struct sigaction old_bus_act;
>  #      endif
>@@ -731,9 +731,11 @@
>                 /* and setting a handler at the same time.              */
>                 (void) sigaction(SIGSEGV, 0, &old_segv_act);
>                 (void) sigaction(SIGSEGV, &act, 0);
>+               (void) sigaction(SIGBUS, 0, &old_bus_act);
>+               (void) sigaction(SIGBUS, &act, 0);
>  #        else
>                 (void) sigaction(SIGSEGV, &act, &old_segv_act);
>-#              if defined(IRIX5) && defined(_sigargs) /* Irix 5.x, not 6.x 
>*/ \
>+#              if defined(IRIX5) \
>                    || defined(HPUX) || defined(HURD) || defined(NETBSD)
>                     /* Under Irix 5.x or HP/UX, we may get SIGBUS.      */
>                     /* Pthreads doesn't exist under Irix 5.x, so we     */
>@@ -772,7 +774,7 @@
>  #       if defined(SUNOS5SIGS) || defined(IRIX5) \
>            || defined(OSF1) || defined(HURD) || defined(NETBSD)
>           (void) sigaction(SIGSEGV, &old_segv_act, 0);
>-#        if defined(IRIX5) && defined(_sigargs) /* Irix 5.x, not 6.x */ \
>+#        if defined(IRIX5) \
>              || defined(HPUX) || defined(HURD) || defined(NETBSD)
>               (void) sigaction(SIGBUS, &old_bus_act, 0);
>  #        endif
>
>On Tue, 12 Apr 2005, Dan Bonachea wrote:
>
> > At 04:39 AM 4/11/2005, you wrote:
> > >GC6.4 apparently no longer worked on Irix with threads.  Apparently
> > >a bug in aix_irix_threads.c was no longer hidden by a very lenient
> > >pthread_attr_getdetachstate.
> > >
> > >The following patch should solve the problem.
> >
> > Hi Hans - Thanks for the patch.
> >
> > 32-bit gctest seems to be working now, although I'm still seeing some bus
> > errors using the IRIX GC in Titanium applications on MPI-based backends, 
> so I
> > believe there are some other IRIX-GC issues remaining - I suspect the 
> problem
> > is the shared libraries which MPI loads (eg /usr/lib32/libmpi.so), but I 
> don't
> > have proof of that yet. Do you have any small GC correctness tests that 
> test
> > the use of MPI and/or shared libraries that allocate non-trivial memory?
> >
> > In any case, 64-bit GC still seems to be completely broken on lou, both 
> with
> > and without pthreads. If you try configuring 6.4, including your patch 
> below
> > with:
> >    setenv CC "/usr/bin/cc -64"
> > then gctest should give you a bus error in GC_find_limit :
> >
> > #0  0x000000001001559c in GC_find_limit (p=0xfffffffab50 "", up=1) at
> > os_dep.c:811
> > #1  0x000000001001561c in GC_get_stack_base () at os_dep.c:1038
> > #2  0x000000001000fdfc in GC_init_inner () at misc.c:676
> > #3  0x000000001001d9c4 in GC_generic_malloc_inner (lb=7, k=1) at 
> malloc.c:123
> > #4  0x000000001001dc4c in GC_generic_malloc (lb=7, k=1) at malloc.c:192
> > #5  0x000000001001dfa8 in GC_malloc (lb=7) at malloc.c:297
> > #6  0x000000001000af44 in run_one_test () at test.c:1218
> > #7  0x000000001000bdac in main () at test.c:1517
> >
> > This is the signal handler problem I originally reported that apparently 
> still
> > remains. It's also possible this signal issue is the same problem MPI is
> > having - perhaps it registers some SIGBUS handlers for its own uses (eg
> > parallel job shutdown) that interfere with the GC_find_limit scan. I think
> > perhaps we need a more robust way to find the stack base on IRIX...
> >
> > Dan
> >
> > PS - lou lacks gdb, but I have it installed here:
> > ~bonachea/bin/gnu/bin/gdb
> > (note you'll need to link the static libgc.a to use gdb)
> >
> > >Hans
> > >
> > >--- aix_irix_threads.c.orig     Sat Apr  9 20:37:22 2005
> > >+++ aix_irix_threads.c  Sat Apr  9 20:38:17 2005
> > >@@ -580,7 +580,11 @@
> > >      si -> start_routine = start_routine;
> > >      si -> arg = arg;
> > >
> > >-    pthread_attr_getdetachstate(attr, &detachstate);
> > >+    if (NULL == attr) {
> > >+       detachstate = PTHREAD_CREATE_JOINABLE;
> > >+    } else {
> > >+        pthread_attr_getdetachstate(attr, &detachstate);
> > >+    }
> > >      if (PTHREAD_CREATE_DETACHED == detachstate) my_flags |= DETACHED;
> > >      si -> flags = my_flags;
> > >      result = pthread_create(new_thread, attr, GC_start_routine, si);
> >
> >
>_______________________________________________
>Gc mailing list
>Gc at linux.hpl.hp.com
>http://www.hpl.hp.com/hosted/linux/mail-archives/gc/



More information about the Gc mailing list