[Gc] Abuse of collector...

Talbot, George Gtalbot at locuspharma.com
Wed May 6 12:58:45 PDT 2009

Hi all,

I've integrated the collector into my program and it works pretty well.  However, I've got a probably non-ideal pattern for the collector at startup.  My program accepts incoming connections from 96 machines, and builds a tree-like data structure that represents data present on all of the machines.  This startup process takes several minutes, and most of the time is spent in the collector, as this data structure is about 400MB of pointer-containing data, and about 60MB of pointer-free data.  I'm careful to allocate any larger pieces of data using the GC_malloc(*)_ignore_off_page() variants.  My program is running on a multi-core 64-bit box and ends up with a 900MB heap when it's done.

The program is several years old, and I've moved to GC as I can't afford to rewrite it in Java and I've been having memory usage issues (probably some leaks).

I'm using the 7.1 version of the collector (6.8 doesn't appear to work nearly as well for me), and it's running in parallel.

I would assume that after this rather murderous startup, where the data structure is continuously modified by many threads, and many allocations occur, that after that the "generational" features of the collector will kick in, and the collection cost will go down.

Right now on my box, the collection cost is on the order of 3500ms/collection using four threads.  Once the system is up and the data structure is built, it's quite responsive, but I'm a bit worried that I'll get occasional 3-4s pauses when it gets around to another collection.


1)  Does the time spent sound sane with experience that others have had?
2)  Is there a way to spend less time in the allocator during the initial startup?
3)  Am I reasonable to believe that in the parallel collector, generational features will save me from super-long collections if my data structure is relatively constant after the startup?  (i.e. no more than say 5% changes every couple of hours or so.)

Sorry if these are "newbie"-style questions.

As an experiential note:  This program is a C++ program that I've converted from using boost::shared_ptr<> and the standard STL allocators to use the features of gc_cpp.h and gc_allocator.h for the STL collections.  Improvements I've been able to make are:

o I've been able to get rid of many of the locks in my program by replacing them with a "sample...mutate...compare_and_swap...repeat_on_contention" loop.
o Uses about half the memory as its predecessor, as certain data structures that I had to cache I no longer have to cache.
o Is much simpler.
o Once the startup delay passes, is at least as responsive, if not moreso, than the previous program.

The collector works quite well.  I'm sure I'll be "getting used to it" for a while.

George T. Talbot
<gtalbot at locuspharma.com>

More information about the Gc mailing list