[Gc] Extra info in mark-bytes?
Jan Wielemaker
J.Wielemaker at vu.nl
Mon Jan 2 05:53:16 PST 2012
Hi,
PROBLEM
=======
I'm trying to figure out whether I can use bdwgc for the following
use-case:
* A huge main-memory DB of mostly small objects. Should scale to
memory available in server HW (say 100Gb at this moment).
* Typically, most objects are nicely linked from a shared view on
the data. I.e., there is nothing to do for GC. I'll call this
the `permanent' heap.
* In addition, there are short-lived objects involved in (multi-
threaded) computation. Using GC greatly simplifies life here.
So far, so good: just use GC for the latter category. But ...
* These temporary objects (and C variables) tend to reference the
`permanent ones'. `Permanent' between quotes, because they may be
deleted. Deleting here means disconnecting from other `permanent'
objects. They must be kept alive as long as they are referenced.
This requires GC to be used on the `permanent' objects, but marking the
zillions of objects in the `permanent' heap is too slow :-(
EXPERIMENT
==========
I did an experiment by creating my own kind with marking procedure and
have a bit on the object that was set before the object was unlinked
from the permanent heap. The custom marking procedure only marks
referenced pointers if this bit is set. That gets the marking time down
dramatically, but of course the first GC now wipes most permanent
objects :-(
NEXT STEP
=========
We need a way to protect objects. That is `uncollectable', but there is
no way to move objects from uncollectable to collectable. Digging into
the source, I discovered that the mark bit is actually a mark byte. That
offered possibilities. I am planning to use two of the bits here. One to
state that the object is uncollectable and the other that it is atomic
(which avoids the need for my custom marker for most of the use-cases).
I added an experimental GC_set_flags(ptr, flags) and GC_clear_flags(ptr,
flags), such that I can do
obj = GC_malloc(size);
<initialise and link to permanent heap>
GC_set_flags(obj, GC_FLAG_UNCOLLECTABLE|GC_FLAG_ATOMIC);
and, to get rid of the object:
GC_clear_flags(obj, GC_FLAG_UNCOLLECTABLE|GC_FLAG_ATOMIC);
<unlink from permanent heap>
(There is some room for optimizing here; e.g. clearing the marks is more
expensive, etc., but this seems feasible, e.g., by keeping a count of
the number of objects with GC_FLAG_UNCOLLECTABLE in a block; often
this will be all objects in the block, removing the need to clear and
scan for unmarked objects during the sweep).
The idea is that objects are not collected if either the MARK bit or the
UNCOLLECTABLE bit is set. Marking will be skipped if GC_FLAG_ATOMIC is
set (unimplemented).
The attached patch is my current state.
QUESTIONS
=========
* The comments say that using a byte for marker avoids the need for
interlocked bitwise or. Is this true? Isn't setting a byte
effectively
fetching a word, changing it and writing it back? That would
explain some
crashes I've seen in my tests ...
* As soon as I set the above bits, I see that GC_malloc() no longer uses
the free list, but calls GC_generic_malloc_many() for most calls. It
seems that playing with these bits affect the free list!? Does someone
have an idea why?
Thanks --- Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dynmark.patch
Type: text/x-patch
Size: 4609 bytes
Desc: dynmark.patch
Url : http://napali.hpl.hp.com/pipermail/gc/attachments/20120102/e31eb8a7/dynmark.bin
More information about the Gc
mailing list