[Gc] Re: Desperately needing GC 7.1

Boehm, Hans hans.boehm at hp.com
Tue Jan 29 11:37:42 PST 2008


Andreas -

It sounds like, in light of your insight, maybe the right thing to do here for now is to actually check in a patch that forces the GC_INIT() call for Darwin, i.e. treats it like AIX or Cygwin.  If we do that, and disable parallel marking for now, does everything work reliably?

I think that both of these should be fixed eventually.  But it may be a good idea to get 7.1 out in the meantime.

Thanks.

Hans

> -----Original Message-----
> From: Andreas Tobler [mailto:andreast-list at fgznet.ch]
> Sent: Monday, January 14, 2008 1:39 PM
> To: Boehm, Hans
> Cc: gc at napali.hpl.hp.com
> Subject: Re: [Gc] Re: Desperately needing GC 7.1
>
> Hi Hans,
>
> Hans Boehm wrote:
>
> > I'm not sure we're on the right track here:
>
> And I'm not sure if I'm really on the wrong track.
> The GC_INIT in test.c was not meant for going into cvs.
> It should only help me to get a starting point where to look
> for the real issue.
>
> > On DARWIN, GC_INIT() expands to just GC_init(), and that should be
> > happening implicitly, even without the call.  I suspect that the
> > explicit GC_INIT() call just alters the timing enough that we don't
> > observe the race.
>
> Agreed, but, without calling GC_init in the test case I never
> reach GC_init_dyld in gdb.
> A notice in the dyn_load.c says:
>
> /* The _dyld_* functions have an internal lock so no _dyld functions
>     can be called while the world is stopped without the risk
> of a deadlock.
>     Because of this we MUST setup callbacks BEFORE we ever
> stop the world.
>     This should be called BEFORE any thread in created and WITHOUT the
>     allocation lock held. */
>
> Now the trace looks like this:
> Entering GC_mprotect_thread_notify
> ^C
> Program received signal SIGINT, Interrupt.
> 0x943ce9d8 in mach_msg_trap ()
> (gdb) bt
> #0  0x943ce9d8 in mach_msg_trap ()
> #1  0x943d5900 in mach_msg ()
> #2  0x00060fe4 in GC_mprotect_thread_notify (id=1) at
> ../bdwgc/os_dep.c:3547
> #3  0x0006743c in GC_stop_world () at ../bdwgc/darwin_stop_world.c:514
> #4  0x0005240c in GC_stopped_mark (stop_func=0x520b0
> <GC_never_stop_func>) at ../bdwgc/alloc.c:468
> #5  0x000531cc in GC_try_to_collect_inner (stop_func=0x520b0
> <GC_never_stop_func>) at ../bdwgc/alloc.c:356
> #6  0x0005fedc in GC_init_inner () at ../bdwgc/misc.c:730
> #7  0x0005ffa8 in GC_enable_incremental () at ../bdwgc/misc.c:788
> #8  0x00003d14 in main () at ../bdwgc/tests/test.c:1614
>
>
> In frame 6 we jump to GC_try_to_collect_inner which calls
> GC_stopped_mark and this one calls GC_stop_world before ever
> having called GC_init_dyld.
>
> Putting the GC_init_dyld code before misc.c:730 helps a
> little bit. But not reliably.
>
> Only if I add a usleep(1); after GC_init_dyld(); I get a more
> reliable result on my G4.
>
> Strange.....
>
> The attached diff is also not meant for inclusion. It should
> show where I am.
>
> >
> > Last I looked at the code, it still seemed to me that there was a
> > problem with parallel mark in that the thread stopping code
> seemed to
> > inadvertently also stop the marker threads (as created by
> > start_mark_threads()), which I believe could have bad
> results, such as
> > the ones we're observing.  I suspect this does not happen on
> > uniprocessors since, without explicit instructions to the
> contrary via
> > an environment variable, the collector will not create any separate
> > marker threads in this case.  But it would explain problems on dual
> > core machines.  I put a FIXME comment in roughly the right
> place, I think.
>
> That's a different problem. Or not ... I'm investigating still.
>
> Andreas
>



More information about the Gc mailing list