[Gc] Extra info in mark-bytes?

Jan Wielemaker J.Wielemaker at vu.nl
Mon Jan 2 05:53:16 PST 2012


Hi,

PROBLEM
=======

I'm trying to figure out whether I can use bdwgc for the following
use-case:

   * A huge main-memory DB of mostly small objects.  Should scale to
     memory available in server HW (say 100Gb at this moment).
   * Typically, most objects are nicely linked from a shared view on
     the data.  I.e., there is nothing to do for GC.  I'll call this
     the `permanent' heap.
   * In addition, there are short-lived objects involved in (multi-
     threaded) computation.  Using GC greatly simplifies life here.

So far, so good: just use GC for the latter category.  But ...

   * These temporary objects (and C variables) tend to reference the
     `permanent ones'.  `Permanent' between quotes, because they may be
     deleted.  Deleting here means disconnecting from other `permanent'
     objects. They must be kept alive as long as they are referenced.

This requires GC to be used on the `permanent' objects, but marking the
zillions of objects in the `permanent' heap is too slow :-(

EXPERIMENT
==========

I did an experiment by creating my own kind with marking procedure and
have a bit on the object that was set before the object was unlinked
from the permanent heap. The custom marking procedure only marks
referenced pointers if this bit is set. That gets the marking time down
dramatically, but of course the first GC now wipes most permanent
objects :-(

NEXT STEP
=========

We need a way to protect objects. That is `uncollectable', but there is 
no way to move objects from uncollectable to collectable. Digging into
the source, I discovered that the mark bit is actually a mark byte. That
offered possibilities. I am planning to use two of the bits here. One to
state that the object is uncollectable and the other that it is atomic
(which avoids the need for my custom marker for most of the use-cases).

I added an experimental GC_set_flags(ptr, flags) and GC_clear_flags(ptr, 
flags), such that I can do

	obj = GC_malloc(size);
	<initialise and link to permanent heap>
         GC_set_flags(obj, GC_FLAG_UNCOLLECTABLE|GC_FLAG_ATOMIC);

and, to get rid of the object:

	GC_clear_flags(obj, GC_FLAG_UNCOLLECTABLE|GC_FLAG_ATOMIC);
	<unlink from permanent heap>

(There is some room for optimizing here; e.g. clearing the marks is more
  expensive, etc., but this seems feasible, e.g., by keeping a count of 
  the number of objects with GC_FLAG_UNCOLLECTABLE in a block; often 
this  will be all objects in the block, removing the need to clear and 
scan for unmarked objects during the sweep).

The idea is that objects are not collected if either the MARK bit or the
UNCOLLECTABLE bit is set.  Marking will be skipped if GC_FLAG_ATOMIC is
set (unimplemented).

The attached patch is my current state.

QUESTIONS
=========

   * The comments say that using a byte for marker avoids the need for
     interlocked bitwise or.  Is this true?  Isn't setting a byte 
effectively
     fetching a word, changing it and writing it back?  That would 
explain some
     crashes I've seen in my tests ...

   * As soon as I set the above bits, I see that GC_malloc() no longer uses
     the free list, but calls GC_generic_malloc_many() for most calls.  It
     seems that playing with these bits affect the free list!?  Does someone
     have an idea why?

	Thanks --- Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dynmark.patch
Type: text/x-patch
Size: 4609 bytes
Desc: dynmark.patch
Url : http://napali.hpl.hp.com/pipermail/gc/attachments/20120102/e31eb8a7/dynmark.bin


More information about the Gc mailing list