[Gc] Re: Understanding why GC'ing increases to the double in time?

Martin Egholm Nielsen martin at egholm-nielsen.dk
Tue Jan 31 02:47:29 PST 2006


> This looks somewhat mysterious to me.
That's good :-)
I take that as a small hope... But let's see.

> The first (very) expensive collection overflows and then grows the mark
> stack.  That is expected to be expensive, but shouldn't affect later
> collections.  This should go away if you increase
> INITIAL_MARK_STACK_SIZE (in mark.c).  It would be interesting to see if
> that changes later GC times as well.  If so, I would suspect a GC bug.
> If you can easily rebuild the collector, I think that would be a
> worthwhile experiment.  The fact that the mark stack overflow occurs
> exactly at the transition is suspicious.
That will be easily done - I'll try later today (or tomorrow)...
(Don't actually know why I just don't wait responding until then, but 
personally I prefer knowing if people actually take actions.)

> Based on a quick look at the heap block dumps, I think it should be
> unusually cheap to trace your heaps.  They seem to consist mostly of
> large pointer-free objects.  The GC doesn't even touch those pages
> during tracing.
> 
> The two things that are likely to make garbage collection expensive here
> are:
> 
> 1) Scanning the 3 MB root set.  The collector does have to read those at
> every GC.  That's really a gcj issue that should get fixed there.  It
> looks to me like there is more data here than there is
> pointer-containing data in the heap, and thus most of the scan time
> would probably go here.
But the time spent here is covered within the /small/ "world stopped" 
times (the 200ms), right?!

> (I usually assume that trace time depends to a
> significant extent on the amount of memory that needs to be moved into
> the cache.  That's probably less true on your platform.  I assume the
> miss penalty is only 10 or 20 cycles?  Are cache lines large enough that
> sequential reads perform well?)
Now, those questions I need to get back to later - after referring to 
some of the HW-guys maybe knowing this. (It's a PPC405EP - just to clarify.)

> 2) Processing of finalizable objects may be an issue.  There are more of
> them than I would have expected.  It would be nice to understand where
> they're coming from. 
Of course. Does this number include the number of referenced objects, as 
well?
Can I disable something (finalization) in order to see if this is an issue?

> If you can set a breakpoint in GC_register_finalizer_inner, and
> sample every few hundred calls to see where they are coming from,
> that might be interesting.  I'm not sure whether finalizable objects
> are a major factor here, but I can't really preclude it either.
Now, this becomes a bit more tricky. However, if this turns out to be 
the only way to proceed, I'll do what it takes... (You're dealing with a 
GDB pedestrian :-( )

> If you have some way of getting an execution profile for the time spent
> in the "long" GCs that might help to track things down.
Sure - never tried though (besides profiling a kernel module). Does this 
require special treatment at compiletime? And no binary stripping?
Doing this will require something from the profiler: I need being able 
to reset the profiling data until ready for measuring a sweep. I guess 
there must be profilers that can be controlled using socket communication?!

// Martin



More information about the Gc mailing list