[Gc] RE: Abuse of collector...
hans.boehm at hp.com
Wed May 6 16:34:32 PDT 2009
> -----Original Message-----
> From: gc-bounces at napali.hpl.hp.com
> [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of Talbot, George
> Sent: Wednesday, May 06, 2009 12:59 PM
> To: gc at linux.hpl.hp.com
> Subject: [Gc] Abuse of collector...
> Hi all,
> I've integrated the collector into my program and it works
> pretty well. However, I've got a probably non-ideal pattern
> for the collector at startup. My program accepts incoming
> connections from 96 machines, and builds a tree-like data
> structure that represents data present on all of the
> machines. This startup process takes several minutes, and
> most of the time is spent in the collector, as this data
> structure is about 400MB of pointer-containing data, and
> about 60MB of pointer-free data. I'm careful to allocate any
> larger pieces of data using the
> GC_malloc(*)_ignore_off_page() variants. My program is
> running on a multi-core 64-bit box and ends up with a 900MB
> heap when it's done.
> The program is several years old, and I've moved to GC as I
> can't afford to rewrite it in Java and I've been having
> memory usage issues (probably some leaks).
> I'm using the 7.1 version of the collector (6.8 doesn't
> appear to work nearly as well for me), and it's running in parallel.
> I would assume that after this rather murderous startup,
> where the data structure is continuously modified by many
> threads, and many allocations occur, that after that the
> "generational" features of the collector will kick in, and
> the collection cost will go down.
> Right now on my box, the collection cost is on the order of
> 3500ms/collection using four threads. Once the system is up
> and the data structure is built, it's quite responsive, but
> I'm a bit worried that I'll get occasional 3-4s pauses when
> it gets around to another collection.
> 1) Does the time spent sound sane with experience that
> others have had?
If this is a modern X86 box or the like, it sounds too high to me. I would have expected under a second. What's the OS? Can you profile the executable? If not, random interruptions in a debugger might give you an idea. If the time is not being spent in GC_mark_from(), something fishy is going on. Looking at the log output might also be informative. You don't have GC assertions enabled, right?
> 2) Is there a way to spend less time in the allocator during
> the initial startup?
Explicitly calling GC_expand_hp() with the approximate final heap size should help.
> 3) Am I reasonable to believe that in the parallel
> collector, generational features will save me from super-long
> collections if my data structure is relatively constant after
> the startup? (i.e. no more than say 5% changes every couple
> of hours or so.)
Incremental collection currently doesn't combine well with parallel collection. And incremental collection is somewhat tricky to use anyway, depending on the platform. It's not on by default.
Turning on only generational collection with parallel GC may be OK. To do that, set GC_time_limit to GC_TIME_UNLIMITED, and then call GC_enable_incremental().
I also really need to get out a new version; there is unfortunately some chance you are running into an old problem.
> Sorry if these are "newbie"-style questions.
> As an experiential note: This program is a C++ program that
> I've converted from using boost::shared_ptr<> and the
> standard STL allocators to use the features of gc_cpp.h and
> gc_allocator.h for the STL collections. Improvements I've
> been able to make are:
> o I've been able to get rid of many of the locks in my
> program by replacing them with a
> "sample...mutate...compare_and_swap...repeat_on_contention" loop.
> o Uses about half the memory as its predecessor, as certain
> data structures that I had to cache I no longer have to cache.
> o Is much simpler.
> o Once the startup delay passes, is at least as responsive,
> if not moreso, than the previous program.
> The collector works quite well. I'm sure I'll be "getting
> used to it" for a while.
> George T. Talbot
> <gtalbot at locuspharma.com>
> Gc mailing list
> Gc at linux.hpl.hp.com
More information about the Gc