[Gc] Re: Weak hash tables and cycles

Petter Urkedal urkedal at nbi.dk
Sat Nov 8 08:46:16 PST 2008


On 2008-11-06, Hans Boehm wrote:
> There was an old patch posted to this list that allowed for call-backs
> after the initial marking, that would probably have made this easier
> and more efficient.  I think it's still on my list somewhere, but it
> neveer quite made it in, at least not yet.

I think you refer to my patch.  Shortly after sending it, I made another
modification to the collector which admitted a more efficient
hash-consing implementation.  It still needs changes to the collector,
but I though I'd hold the patch until 7.0 was released.  I've maintained
it outside the GC tree since.

I'd like to take part in improving the collector to support weak data
structures, since hash-consing is a special case which some of my code
heavily depends on.  I think my patch is a sound basis, but if someone
can come up with a more efficient approach, that's even better.

I'll give a brief description of the patch below.  Note that my goal was
performance rather than convenience.  It is a low-level API on which we
can build weak data structures and finalisation.  When deciding on a
solution, it would be useful to list the data structures we want to
support, and to make some reference implementations, maybe to be shipped
with the collector.


ABOUT THE PATCH

  * A patch against today's CVS repo is found under
	http://www.eideticdew.org/download/

  * Some old but mostly relevant comments can be found at
	http://www.eideticdew.org/culibs/bdwgc-rn.html.

One of my goals was to find a solution where as little as possible as
done while the collector is locked in a consistent mark state.
Therefore, the hash-table updates and other work is done outside this
period.  Since this may allow the application to access keys which has
been scheduled for reclaim, the callback provided by a weak set/map must
be able to block the reclaim of an object.  The way I implemented this
is though a "disclaim" callback which is registered on a per-kind basis.
The collector calls this function to ask the application to "disclaim"
an object.  A zero return means successful disclaim (we should maybe
reverse that).

One point I'm unhappy with is the fact that the disclaim function is
called on objects on the free-list, and must somehow figure this out and
return 1.  I've solved this by setting the low bit on the first word on
valid objects.  We could maybe add free lists for disclaim-enabled kinds
to the root set instead.

When registering a memory kind, there is a new option to mark from
unmarked objects from that kind.  Typically one would disable it for
Java-style finalisation, weak set-like structures, and hash-consing, and
enable it for normal finalisation and weak map-like structures.


WEAK DATA STRUCTURES

For weak hash-tables the disclaim function will typically check if the
key has been accessed since the last collection, if so it returns
non-zero to retain the objects, otherwise it removes the key from the
table and returns 0.

The "disclaim" function is triggered by the allocation function though
reclaim.c, so the possibility of deadlocking must be considered when
locking the hash-table before removing keys.  An easy solution is to use
pthread_mutex_trylock.  If it fails, just retain the object.  The
probability that pthread_mutex_trylock fails is generally low, and we
get another attempt on the next collection unless the key has been
resurrected.

(I would be elegant in a multi-threaded application to run the reclaim
work (and thus the disclaim callbacks) in a separate thread, but I have
not tried this.)


FINALISATION

I added an finalisation implementation as an example and for
benchmarking.  Since it uses a separate memory kind, it's not possible
to use the GC_malloc.  Instead the user should know in advance that
finalisation is required and call GC_finalized_malloc.

I call this an example, because languages such as C++ and some libraries
store dynamic type information in objects.  In that case the language or
library can register a single memory kind for all their objects, with a
disclaim function which knowns how to locate and dispatch to the actual
finaliser.

GC_REGISTER_FINALIZER and friends should be compatible with
disclaim-enabled kinds, though I have not tested this.  It may therefore
be useful to use GC_REGISTER_FINALIZER in combination with weak hash
tables to avoid stacking finalisation work.


More information about the Gc mailing list