[Gc] Extra info in mark-bytes?
Bruce Hoult
bruce at hoult.org
Mon Jan 2 14:18:48 PST 2012
This seems to me to be entirely the wrong way to go about this.
If the proportion of objects involved in the multi-threaded
computation is small then I think you should make those computations
refer to them via a GC-able proxy object with a finalizer.
What you'll need:
- your DB class X
- a bit D in X saying whether it has been unlinked from the main
structure and should be deleted
- a garbage collected proxy object P for each X that is currently
involved in the computation. This simply contains a pointer to its X.
- a way to find the unique P (if any) for each X. This can be a
pointer within X if X is big enough that the space overhead is low, or
a (non GC) hash table if time overhead is more acceptable than space
overhead.
When an X is used in the computation its P should be located, or one
created if it doesn't have one, and the P passed to the computation
instead of the X itself.
When an X is deleted from the DB check if it has a P. If not then just
delete it. Otherwise unlink it and set its D.
The finalizer for P should check whether the D in its X is set and if
so then delete the X. It should in any case remove itself from the
hash table (if that's how an X finds its P), or null the pointer to
itself (if that's how an X finds its P).
On Tue, Jan 3, 2012 at 2:53 AM, Jan Wielemaker <J.Wielemaker at vu.nl> wrote:
> Hi,
>
> PROBLEM
> =======
>
> I'm trying to figure out whether I can use bdwgc for the following
> use-case:
>
> * A huge main-memory DB of mostly small objects. Should scale to
> memory available in server HW (say 100Gb at this moment).
> * Typically, most objects are nicely linked from a shared view on
> the data. I.e., there is nothing to do for GC. I'll call this
> the `permanent' heap.
> * In addition, there are short-lived objects involved in (multi-
> threaded) computation. Using GC greatly simplifies life here.
>
> So far, so good: just use GC for the latter category. But ...
>
> * These temporary objects (and C variables) tend to reference the
> `permanent ones'. `Permanent' between quotes, because they may be
> deleted. Deleting here means disconnecting from other `permanent'
> objects. They must be kept alive as long as they are referenced.
>
> This requires GC to be used on the `permanent' objects, but marking the
> zillions of objects in the `permanent' heap is too slow :-(
>
> EXPERIMENT
> ==========
>
> I did an experiment by creating my own kind with marking procedure and
> have a bit on the object that was set before the object was unlinked
> from the permanent heap. The custom marking procedure only marks
> referenced pointers if this bit is set. That gets the marking time down
> dramatically, but of course the first GC now wipes most permanent
> objects :-(
>
> NEXT STEP
> =========
>
> We need a way to protect objects. That is `uncollectable', but there is no
> way to move objects from uncollectable to collectable. Digging into
> the source, I discovered that the mark bit is actually a mark byte. That
> offered possibilities. I am planning to use two of the bits here. One to
> state that the object is uncollectable and the other that it is atomic
> (which avoids the need for my custom marker for most of the use-cases).
>
> I added an experimental GC_set_flags(ptr, flags) and GC_clear_flags(ptr,
> flags), such that I can do
>
> obj = GC_malloc(size);
> <initialise and link to permanent heap>
> GC_set_flags(obj, GC_FLAG_UNCOLLECTABLE|GC_FLAG_ATOMIC);
>
> and, to get rid of the object:
>
> GC_clear_flags(obj, GC_FLAG_UNCOLLECTABLE|GC_FLAG_ATOMIC);
> <unlink from permanent heap>
>
> (There is some room for optimizing here; e.g. clearing the marks is more
> expensive, etc., but this seems feasible, e.g., by keeping a count of the
> number of objects with GC_FLAG_UNCOLLECTABLE in a block; often this will be
> all objects in the block, removing the need to clear and scan for unmarked
> objects during the sweep).
>
> The idea is that objects are not collected if either the MARK bit or the
> UNCOLLECTABLE bit is set. Marking will be skipped if GC_FLAG_ATOMIC is
> set (unimplemented).
>
> The attached patch is my current state.
>
> QUESTIONS
> =========
>
> * The comments say that using a byte for marker avoids the need for
> interlocked bitwise or. Is this true? Isn't setting a byte effectively
> fetching a word, changing it and writing it back? That would explain
> some
> crashes I've seen in my tests ...
>
> * As soon as I set the above bits, I see that GC_malloc() no longer uses
> the free list, but calls GC_generic_malloc_many() for most calls. It
> seems that playing with these bits affect the free list!? Does someone
> have an idea why?
>
> Thanks --- Jan
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
> _______________________________________________
> Gc mailing list
> Gc at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
More information about the Gc
mailing list