[Gc] Extra info in mark-bytes?

Bruce Hoult bruce at hoult.org
Mon Jan 2 14:18:48 PST 2012


This seems to me to be entirely the wrong way to go about this.

If the proportion of objects involved in the multi-threaded
computation is small then I think you should make those computations
refer to them via a GC-able proxy object with a finalizer.

What you'll need:

- your DB class X

- a bit D in X saying whether it has been unlinked from the main
structure and should be deleted

- a garbage collected proxy object P for each X that is currently
involved in the computation. This simply contains a pointer to its X.

- a way to find the unique P (if any) for each X. This can be a
pointer within X if X is big enough that the space overhead is low, or
a (non GC) hash table if time overhead is more acceptable than space
overhead.

When an X is used in the computation its P should be located, or one
created if it doesn't have one, and the P passed to the computation
instead of the X itself.

When an X is deleted from the DB check if it has a P. If not then just
delete it. Otherwise unlink it and set its D.

The finalizer for P should check whether the D in its X is set and if
so then delete the X. It should in any case remove itself from the
hash table (if that's how an X finds its P), or null the pointer to
itself (if that's how an X finds its P).


On Tue, Jan 3, 2012 at 2:53 AM, Jan Wielemaker <J.Wielemaker at vu.nl> wrote:
> Hi,
>
> PROBLEM
> =======
>
> I'm trying to figure out whether I can use bdwgc for the following
> use-case:
>
>  * A huge main-memory DB of mostly small objects.  Should scale to
>    memory available in server HW (say 100Gb at this moment).
>  * Typically, most objects are nicely linked from a shared view on
>    the data.  I.e., there is nothing to do for GC.  I'll call this
>    the `permanent' heap.
>  * In addition, there are short-lived objects involved in (multi-
>    threaded) computation.  Using GC greatly simplifies life here.
>
> So far, so good: just use GC for the latter category.  But ...
>
>  * These temporary objects (and C variables) tend to reference the
>    `permanent ones'.  `Permanent' between quotes, because they may be
>    deleted.  Deleting here means disconnecting from other `permanent'
>    objects. They must be kept alive as long as they are referenced.
>
> This requires GC to be used on the `permanent' objects, but marking the
> zillions of objects in the `permanent' heap is too slow :-(
>
> EXPERIMENT
> ==========
>
> I did an experiment by creating my own kind with marking procedure and
> have a bit on the object that was set before the object was unlinked
> from the permanent heap. The custom marking procedure only marks
> referenced pointers if this bit is set. That gets the marking time down
> dramatically, but of course the first GC now wipes most permanent
> objects :-(
>
> NEXT STEP
> =========
>
> We need a way to protect objects. That is `uncollectable', but there is no
> way to move objects from uncollectable to collectable. Digging into
> the source, I discovered that the mark bit is actually a mark byte. That
> offered possibilities. I am planning to use two of the bits here. One to
> state that the object is uncollectable and the other that it is atomic
> (which avoids the need for my custom marker for most of the use-cases).
>
> I added an experimental GC_set_flags(ptr, flags) and GC_clear_flags(ptr,
> flags), such that I can do
>
>        obj = GC_malloc(size);
>        <initialise and link to permanent heap>
>        GC_set_flags(obj, GC_FLAG_UNCOLLECTABLE|GC_FLAG_ATOMIC);
>
> and, to get rid of the object:
>
>        GC_clear_flags(obj, GC_FLAG_UNCOLLECTABLE|GC_FLAG_ATOMIC);
>        <unlink from permanent heap>
>
> (There is some room for optimizing here; e.g. clearing the marks is more
>  expensive, etc., but this seems feasible, e.g., by keeping a count of  the
> number of objects with GC_FLAG_UNCOLLECTABLE in a block; often this  will be
> all objects in the block, removing the need to clear and scan for unmarked
> objects during the sweep).
>
> The idea is that objects are not collected if either the MARK bit or the
> UNCOLLECTABLE bit is set.  Marking will be skipped if GC_FLAG_ATOMIC is
> set (unimplemented).
>
> The attached patch is my current state.
>
> QUESTIONS
> =========
>
>  * The comments say that using a byte for marker avoids the need for
>    interlocked bitwise or.  Is this true?  Isn't setting a byte effectively
>    fetching a word, changing it and writing it back?  That would explain
> some
>    crashes I've seen in my tests ...
>
>  * As soon as I set the above bits, I see that GC_malloc() no longer uses
>    the free list, but calls GC_generic_malloc_many() for most calls.  It
>    seems that playing with these bits affect the free list!?  Does someone
>    have an idea why?
>
>        Thanks --- Jan
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
> _______________________________________________
> Gc mailing list
> Gc at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/



More information about the Gc mailing list