Re[2]: [Gc] hangs on malloc

Ivan Maidanski ivmai at mail.ru
Wed Aug 4 22:14:25 PDT 2010


Hello!

My guess is: nekovm installs own signal handler for SIG_SUSPEND, so GC_suspend_handler is never called.
Try to use some other value for SIG_SUSPEND, e.g. SIGLOST.

Thu, 29 Jul 2010 21:28:20 +0000 "Boehm, Hans" <hans.boehm at hp.com>:

> If you have gdb available, "info threads" and "thread <n>" followed by "w".
> 
> Hans
> 
> > -----Original Message-----
> > From: Ian Martins [mailto:gc at linux.hpl.hp.com]
> > Sent: Thursday, July 29, 2010 2:23 PM
> > To: Boehm, Hans
> > Subject: RE: [Gc] hangs on malloc
> > 
> > yes, the signal is sent.  pthread_kill returns 0.  I'm not sure how to
> > see the other thread's stack trace.
> > 
> > 
> > On Thu, 29 Jul 2010 20:19 +0000, "Boehm, Hans" <hans.boehm at hp.com>
> > wrote:
> > > Since n_live_threads is 1, GC_stop_world() must have sent a signal
> > (using
> > > pthread_kill) to another thread when GC_suspend_all() was called.  Is
> > > that thread still around?  If so, what's it doing?  What does its
> > stack
> > > trace look like?
> > >
> > > Hans
> > >
> > > > -----Original Message-----
> > > > From: Ian Martins [mailto:gc at linux.hpl.hp.com]
> > > > Sent: Thursday, July 29, 2010 9:32 AM
> > > > To: Boehm, Hans
> > > > Subject: RE: [Gc] hangs on malloc
> > > >
> > > > yes, it's linux.   it gets to line 485 of pthread_stop world.c,
> > which
> > > > is
> > > > a call to sem_wait().
> > > >
> > > > i = 0
> > > > n_live_threads = 1
> > > >
> > > > the semaphore's value is 0.  it's initialized to 0 and never
> > changes.
> > > > it looks like the only sem_post is in GC_suspend_handler_inner,
> > which
> > > > is
> > > > never called.
> > > >
> > > >
> > > > On Thu, 29 Jul 2010 00:13 +0000, "Boehm, Hans" <hans.boehm at hp.com>
> > > > wrote:
> > > > > This is on a Linux machine?
> > > > >
> > > > > You may have to look inside GC_stop_world (pthread_stop_world.c),
> > to
> > > > see
> > > > > what it's waiting for.  It has to wait for all other threads to
> > > > > acknowledge receipt of a signal.  If one of them is somehow
> > prevented
> > > > > from doing so, this may happen.  There's a known issue with
> > thread
> > > > > cancellation that may also result in this symptom, which we're
> > still
> > > > > hoping to address shortly.  (My reading of the Posix standard is
> > that
> > > > > exiting threads should be able to respond to signals.  But
> > > > empirically it
> > > > > looks like that isn't always the case.  They need the GC lock to
> > > > exit,
> > > > > which the GC doesn't let go until they respond to a signal.  But
> > > > there
> > > > > seem to be enough other problems with cancellation that people
> > rarely
> > > > run
> > > > > into this.)
> > > > >
> > > > > Hans
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: gc-bounces at linux.hpl.hp.com [mailto:gc-
> > > > bounces at linux.hpl.hp.com]
> > > > > > On Behalf Of Ian Martins
> > > > > > Sent: Wednesday, July 28, 2010 11:23 AM
> > > > > > To: gc at linux.hpl.hp.com
> > > > > > Subject: [Gc] hangs on malloc
> > > > > >
> > > > > > Hello.  I'm having trouble with nekovm (http://nekovm.org)
> > which
> > > > uses
> > > > > > libgc.  There is no trouble when it runs directly, but it hangs
> > > > when it
> > > > > > runs as an apache2 module.  I traced the place where it hangs
> > to a
> > > > > > GC_generic_malloc call.  It hangs at the same place every time.
> > > > I've
> > > > > > got the same behavior on two machines (different hardware, same
> > os:
> > > > > > ubuntu, 32 bit, latest)  I tried some different versions of
> > libgc
> > > > (6.4,
> > > > > > 6.8, 7.1, 7.2alpha4) but got the same result.  This is the
> > stack
> > > > when
> > > > > > it
> > > > > > hangs (using ver 7.2alpha4):
> > > > > >
> > > > > > First call at the top:
> > > > > >                GC_generic_malloc(18, 1)
> > > > > > malloc.c:166.  GC_generic_malloc_inner(18, 1)
> > > > > > malloc.c:126.  GC_allocobj(18, 1)
> > > > > > alloc.c:1299.  GC_collect_or_expand(1, FALSE, FALSE)
> > > > > > alloc.c:1212.  GC_try_to_collect_inner()
> > > > > > alloc.c:436.   GC_stopped_mark()
> > > > > > alloc.c:584.   STOP_WORLD()
> > > > > >
> > > > > > It doesn't return from STOP_WORLD.  Stays there for days.
> > > > > > Please let me know what additional info that may help or of
> > > > anything I
> > > > > > can try.
> > > > > >
> > > > > > Thanks,
> > > > > > Ian



More information about the Gc mailing list