[Gc] Re: Desperately needing GC 7.1
hans.boehm at hp.com
Tue Jan 29 11:37:42 PST 2008
It sounds like, in light of your insight, maybe the right thing to do here for now is to actually check in a patch that forces the GC_INIT() call for Darwin, i.e. treats it like AIX or Cygwin. If we do that, and disable parallel marking for now, does everything work reliably?
I think that both of these should be fixed eventually. But it may be a good idea to get 7.1 out in the meantime.
> -----Original Message-----
> From: Andreas Tobler [mailto:andreast-list at fgznet.ch]
> Sent: Monday, January 14, 2008 1:39 PM
> To: Boehm, Hans
> Cc: gc at napali.hpl.hp.com
> Subject: Re: [Gc] Re: Desperately needing GC 7.1
> Hi Hans,
> Hans Boehm wrote:
> > I'm not sure we're on the right track here:
> And I'm not sure if I'm really on the wrong track.
> The GC_INIT in test.c was not meant for going into cvs.
> It should only help me to get a starting point where to look
> for the real issue.
> > On DARWIN, GC_INIT() expands to just GC_init(), and that should be
> > happening implicitly, even without the call. I suspect that the
> > explicit GC_INIT() call just alters the timing enough that we don't
> > observe the race.
> Agreed, but, without calling GC_init in the test case I never
> reach GC_init_dyld in gdb.
> A notice in the dyn_load.c says:
> /* The _dyld_* functions have an internal lock so no _dyld functions
> can be called while the world is stopped without the risk
> of a deadlock.
> Because of this we MUST setup callbacks BEFORE we ever
> stop the world.
> This should be called BEFORE any thread in created and WITHOUT the
> allocation lock held. */
> Now the trace looks like this:
> Entering GC_mprotect_thread_notify
> Program received signal SIGINT, Interrupt.
> 0x943ce9d8 in mach_msg_trap ()
> (gdb) bt
> #0 0x943ce9d8 in mach_msg_trap ()
> #1 0x943d5900 in mach_msg ()
> #2 0x00060fe4 in GC_mprotect_thread_notify (id=1) at
> #3 0x0006743c in GC_stop_world () at ../bdwgc/darwin_stop_world.c:514
> #4 0x0005240c in GC_stopped_mark (stop_func=0x520b0
> <GC_never_stop_func>) at ../bdwgc/alloc.c:468
> #5 0x000531cc in GC_try_to_collect_inner (stop_func=0x520b0
> <GC_never_stop_func>) at ../bdwgc/alloc.c:356
> #6 0x0005fedc in GC_init_inner () at ../bdwgc/misc.c:730
> #7 0x0005ffa8 in GC_enable_incremental () at ../bdwgc/misc.c:788
> #8 0x00003d14 in main () at ../bdwgc/tests/test.c:1614
> In frame 6 we jump to GC_try_to_collect_inner which calls
> GC_stopped_mark and this one calls GC_stop_world before ever
> having called GC_init_dyld.
> Putting the GC_init_dyld code before misc.c:730 helps a
> little bit. But not reliably.
> Only if I add a usleep(1); after GC_init_dyld(); I get a more
> reliable result on my G4.
> The attached diff is also not meant for inclusion. It should
> show where I am.
> > Last I looked at the code, it still seemed to me that there was a
> > problem with parallel mark in that the thread stopping code
> seemed to
> > inadvertently also stop the marker threads (as created by
> > start_mark_threads()), which I believe could have bad
> results, such as
> > the ones we're observing. I suspect this does not happen on
> > uniprocessors since, without explicit instructions to the
> contrary via
> > an environment variable, the collector will not create any separate
> > marker threads in this case. But it would explain problems on dual
> > core machines. I put a FIXME comment in roughly the right
> place, I think.
> That's a different problem. Or not ... I'm investigating still.
More information about the Gc