[Gc] Questions about threads with GC 6.8

Boehm, Hans hans.boehm at hp.com
Fri Jul 6 16:31:24 PDT 2007


 

> -----Original Message-----
> From: gc-bounces at napali.hpl.hp.com 
> [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of jim marshall
> Sent: Tuesday, July 03, 2007 9:02 PM
> To: gc at napali.hpl.hp.com
> Subject: Re: [Gc] Questions about threads with GC 6.8
> 
> jim marshall wrote:
> > First let me say I appreciate everyones help thus far in my 
> journey to 
> > optimize our app using the GC! The results have been very 
> good so far.
> >
> > Let me just outline our process. Our program is a server which 
> > dynamically loads shared objects from 3rd parties. Our core 
> > application uses the GC, and all the interfaces the shared 
> objects use 
> > to communicate with us will use the GC (as they make up-calls to 
> > allocate objects they have to return to us). However; there 
> is nothing 
> > preventing these shared objects from allocating memory on their own 
> > using the normal CRT malloc/free (we are very careful to 
> not mix this 
> > memory with our memory). It is also possible for the shared 
> objects to 
> > create their own threads to do background work.
> >
> > Currently our server is mostly single threaded, by that I mean we 
> > start a few 'services' in threads, which run in the background to 
> > monitor stuff but they mostly sit idle (they all use GC memory). We 
> > have one secondary thread which listens for requests from a 
> client and 
> > then loads the shared object required. Right now this is the only 
> > thread that does this (e.g. only one client can talk to us 
> at a time).
> > We will be changing this behavior in the near future to 
> allow multiple 
> > clients to communicate with us, each using their own thread.
> >
> > So I have a few questions about some flags and documentation I have 
> > been reading.
> >
> > 1 - http://www.hpl.hp.com/personal/Hans_Boehm/gc/simple_example.html
> > This page shows using the following configure options for threaded 
> > apps "--enable-threads=posix --enable-thread-local-alloc 
> > --enable-parallel-mark" . It is unclear to me if these 
> options are for 
> > the 7.0 alpha or also apply to 6.8 (The GC compiles and 
> seems to work 
> > with these options n 6.8), however; the real question is, 
> if this is 
> > valid for 6.8 should these options (sans posix threads) be 
> enabled on 
> > Windows? If so what are the defines that need to be added to the 
> > makefile?
For Linux, these should all work on 6.8 and 7.0.  Thread-local-alloc
(-DTHREAD_LOCAL_ALLOC) is now the default on more platforms if you use
the configure script.  For 6.8, you needed to use different calls for
thread local allocation (See gc_local_alloc.h).  For 7.0, you just call
GC_malloc.

Parallel-mark is not supported on Windows.  Thread local allocation is
supported on Windows only with 7.0.

> >
> > 2 - On that same page it suggests using:
> >> #define GC_REDIRECT_TO_LOCAL
> >> #include "gc_local_alloc.h"
> > Does this make sense for our application?
On Linux, with 6.8 and threads, probably.  It reduces locking overhead
appreciably.
> >
> > 3 - REGISTER_LIBRARIES_EARLY
> > What is this option and what does it do (I looked at 
> mark_rts,c but it 
> > doesn't seem to do that much)? Previously I sent an email about a 
> > potential deadlock, when we compiled the Windows GC with 
> this defined 
> > that hang seems to have gone away. But we don't really 
> understand what 
> > it does. The weird thing is that the private/gcconfig.h says this 
> > option should be avoided on Win32 so I am very confused by 
> it fixing 
> > our problem.
It causes the library data area registration code to be run before
threads are stopped.  IIRC, this is needed on Linux because the call to
iterate over dynamic libraries requires a loader lock that is not
otherwise accessible, and hence can't be acquired by the collector
before stopping the world.  I suspect the reason for the Windows comment
is that there was nothing to prevent a new library from being loaded
between the time VirtualQuery is called to look at the address space,
and the world is stopped for tracing.  If so, any pointers in static
data associated with that library would be missed.  This may or may not
be an issue for your application.

> Probably should wait until this week is over as a lot of 
> companies go on vacation for July 4th, but I'll try again.
> 
> I don't really need a response for questions 1 & 2, but #3 is 
> kind of important to us so we can try and understand what is 
> happening. 
> Basically without the above option we end up with one thread 
> (which uses malloc/free directly) owning the CRT heap lock, 
> and then the GC stopping the world and then getting stuck 
> waiting for the CRT heap lock (as a side note: this only 
> happens on Windows) here is a stack trace, this is just one 
> example as it is completely timing based:
> 
> 1) This is the thread that owns the _heap_lock.  wcsftime is 
> a CRT function which is allocating memory for the formatted 
> date string; is uses malloc_crt() _heap_alloc_base( 
> _heap_alloc_dbg( _nh_malloc_dbg( _malloc_dbg( wcsftime( 
> cimPrintMessageVA( cimPrintMessage( WSICMPI_trace( 
> logDebugStatus( logDebugMessage( postMessage( __invokeServerElement(
> 
> 2) This is the gc thread that initiates the world stop. Its 
> waiting for the heap lock but since the world is stopped and 
> the first thread owns the _heap_lock so the GC is basically 
> deadlocked here.
> NTDLL! 7c90104b()
> _nh_malloc_dbg(unsigned int 8, int 0, int 1, const char * 
> 0x00000000, int 0) line 242 + 7 bytes malloc(unsigned int 8) 
> line 130 + 21 bytes
> GC_add_current_malloc_heap() line 1265 + 8 bytes 
> GC_is_heap_base(char * 0x00010000) line 1307
> GC_register_dynamic_libraries() line 896 + 30 bytes
> >
> > Again thanks for your perseverance and help!
> > -Jim
> >
Thanks.  This does indeed look like a bug.  My immediate take on this is
that we should be calling GC_add_current_malloc_heap before stopping the
world, and not part of GC_register_dynamic_libraries.  Otherwise we end
up calling the system malloc with the world stopped, as seen here.  And
it's not too surprising that that might hang.

I don't know if I'm going to find time to fix this that soon.  If
someone wants to try this, and send a patch, that would be great.

Hans
> 
> --
> Jim Marshall
> Sr. Staff Engineer
> WBEM Solutions, Inc.
> 978-947-3607
> 
> _______________________________________________
> Gc mailing list
> Gc at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
> 



More information about the Gc mailing list