[Gc] Request for Comments - Manual Generations
bbeuning at corecard.com
Wed Apr 17 08:30:10 PDT 2013
I had an idea for making GC for C / C++ more generally usable
and request your input on the idea. But before getting to the
idea I need to cover some background.
The recent versions of Doug Lea malloc includes the concept of mspaces.
An mspace is a heap, it supports malloc() and free() for objects in
the mspace. A process can have multiple mspaces at once and an mspace
can be deleted entirely which implicitly free's all objects allocated
in the mspace. Some papers have called an mspace a "region".
A test process allocating 2 GB memory using Boehm GC took 6 seconds
to do a GC. For a real-time oriented process, 6 seconds is
not acceptable. (This was a best case scenario, in a real 2 GB
process large parts might be paged out which would make the GC
So what if GC supported a GCregion object that worked much like
an mspace, and we could declare relationships between GCregion
objects that would tell the GC which regions contain pointers to
which other regions. Then we could run GC on just one GCregion
because we know which regions (much less than the entire process)
can contain pointers to the region we want to GC.
Lets talk about an example. Since global variables, the thread stacks,
and the registers can have references to just about anything, lets
call all this memory region Rglobal and we automatically setup to
declare Rglobal has pointers into all other regions.
(This example is from my work but the sizes are fictional to support the example.
I think this memory usage pattern is not uncommon for large processes.)
Region Rdef (think language source code) is created early in the process, size about 200 MB
Region Rinst (think interpreter run-time code) is created next and has pointers into Rdef, size 200 MB
Region Rcache (in memory copies of DB tables) is created next and has pointers to Rinst, size 500 MB
Region Rsession1 (a web session) is created on demand and has pointers to Rinst and Rcache, size 50 MB
If we have 20 sessions, then the process size is about 2 GB.
Region Rroots has pointers into all of the above, size 10 MB
Since Rdef, Rinst, and Rcache are made during startup and do not change,
there is little point in scanning them for free memory.
The session regions have lots of activity so they need GC fairly often.
But to GC a session only requires a mark / sweep of Rglobal, Rroots, and the session region.
This is about 100 MB (instead of 2 GB) which should be quick enough to not
cause any noticeable real-time breaks.
That is the idea. Any comments welcome.
For me the main issue is about getting the memory references between regions right.
In the example, if someone introduces an object in Rinst that points to a session
but does not tell the GC Rinst has pointers to Rsession then an object in a session
might be collected when it is really still in use.
I think this can be addressed by something like the leak detector version of the GC.
But instead of detecting leaks it would detect illegal memory references between regions.
More information about the Gc