[Gc] Generational collection performance disaster
rafe at cs.mu.OZ.AU
Wed Apr 6 21:52:57 PDT 2005
Hans, Ben, many thanks for the swift replies.
We ran the benchmark under Debian Linux with kernel version 2.4.16. The
boehm_gc version we're using is 6.3.2.
Boehm, Hans, Wednesday, 6 April 2005:
> It might be worth checking where the protection faults are
> being generated, i.e. which updates are causing problems.
Ah, can you offer a quick pointer as to how I can do this?
> - It's also probably worthwhile to look at some GC logs both ways.
How do I enable GC logging? Is it just a case of removing SILENT from
the Makefile and recompiling?
> - The current incremental GC depends on signal delivery performance,
> among other things. I assume this was on Linux? Its signal delivery
> performance is normally pretty good.
Yes, we're using Linux 2.4.16, although a fair number of our users run
> - Stubborn allocation support was changed completely in 7.0. This is
> implemented a bit more completely in my 7.0 tree than it is in
> 7.0alpha1, but I'm sure it needs work. Instead of stubborn allocation,
> there is support for explicit write barrier calls. Objects directly
> reachable from the stack are viewed as dirty, so you shouldn't need
> those calls for a functional language. If someone wants to make this
> work for real, I can make available a newer "experimental" version.
We could be interested in this if we can identify the source of the
current performance anomaly and have a reasonable expectation of some
gain (a few per cent would be enough).
> - Multiple generations were tried a long time ago. At the time, the
> added overhead of keeping track of additional per-object bits seemed
> to outweigh the benefit of avoiding premature promotions. Given that
> it increasingly makes sense to reserve a whole byte for each mark bit
> for concurrency reasons, there may be more room for clever
> implementations in this area now.
I've wondered whether it might be worth managing generations on a
per-page basis rather than an per-object basis. Assuming roughly that
- most new objects die young (but do take *some* time to die)
- recently allocated objects will be in the cache
- it is better to reuse addresses in the cache where possible
- minor collections only clear the mark bits from younger pages prior to
- tracing will only touch the relatively small amount of live data in
the younger generations
- if garbage cells are reused first from the younger generations then
they are more likely to still be in the cache
- as pages age they will tend to fill up with longer lived objects and
can therefore be promoted.
One way to optimise things is to preserve a copy of the mark bits on a
page before clearing them prior to a minor collection. On the next
minor collection, restore the old mark bits to avoid retracing older
objects. This way collection can be interleaved with allocation and the
page promotion strategy experimented with fairly easily.
I readily admit that I'm not a GC expert and that the above may be very
naive or be exactly what was tried before. Either way, I'd be very
interested in hearing your opinon.
More information about the Gc