[Gc] Re: Understanding why GC'ing increases to the double in time?
Martin Egholm Nielsen
martin at egholm-nielsen.dk
Tue Jan 31 02:47:29 PST 2006
> This looks somewhat mysterious to me.
That's good :-)
I take that as a small hope... But let's see.
> The first (very) expensive collection overflows and then grows the mark
> stack. That is expected to be expensive, but shouldn't affect later
> collections. This should go away if you increase
> INITIAL_MARK_STACK_SIZE (in mark.c). It would be interesting to see if
> that changes later GC times as well. If so, I would suspect a GC bug.
> If you can easily rebuild the collector, I think that would be a
> worthwhile experiment. The fact that the mark stack overflow occurs
> exactly at the transition is suspicious.
That will be easily done - I'll try later today (or tomorrow)...
(Don't actually know why I just don't wait responding until then, but
personally I prefer knowing if people actually take actions.)
> Based on a quick look at the heap block dumps, I think it should be
> unusually cheap to trace your heaps. They seem to consist mostly of
> large pointer-free objects. The GC doesn't even touch those pages
> during tracing.
> The two things that are likely to make garbage collection expensive here
> 1) Scanning the 3 MB root set. The collector does have to read those at
> every GC. That's really a gcj issue that should get fixed there. It
> looks to me like there is more data here than there is
> pointer-containing data in the heap, and thus most of the scan time
> would probably go here.
But the time spent here is covered within the /small/ "world stopped"
times (the 200ms), right?!
> (I usually assume that trace time depends to a
> significant extent on the amount of memory that needs to be moved into
> the cache. That's probably less true on your platform. I assume the
> miss penalty is only 10 or 20 cycles? Are cache lines large enough that
> sequential reads perform well?)
Now, those questions I need to get back to later - after referring to
some of the HW-guys maybe knowing this. (It's a PPC405EP - just to clarify.)
> 2) Processing of finalizable objects may be an issue. There are more of
> them than I would have expected. It would be nice to understand where
> they're coming from.
Of course. Does this number include the number of referenced objects, as
Can I disable something (finalization) in order to see if this is an issue?
> If you can set a breakpoint in GC_register_finalizer_inner, and
> sample every few hundred calls to see where they are coming from,
> that might be interesting. I'm not sure whether finalizable objects
> are a major factor here, but I can't really preclude it either.
Now, this becomes a bit more tricky. However, if this turns out to be
the only way to proceed, I'll do what it takes... (You're dealing with a
GDB pedestrian :-( )
> If you have some way of getting an execution profile for the time spent
> in the "long" GCs that might help to track things down.
Sure - never tried though (besides profiling a kernel module). Does this
require special treatment at compiletime? And no binary stripping?
Doing this will require something from the profiler: I need being able
to reset the profiling data until ready for measuring a sweep. I guess
there must be profilers that can be controlled using socket communication?!
More information about the Gc