[Gc] Weak-key hash table implementation

Dan Egnor egnor@ofb.net
Tue, 11 Mar 2003 11:35:20 -0500

I'd like to implement (or better yet, reuse) a weak-key hash table for use 
with a C++ application that uses the Boehm GC.  Is there a preferred
implementation technique?

A weak-key hash table is an associative array between keys (which are
pointers to objects) and values (which can be whatever).  The
implementation only keeps a weak reference to the key (hence
"weak-key"), but a strong reference to the value.  If the key is 
collected, the key-value pair is considered to no longer be present in 
the array, the associated value should no longer be referenced 
by the table, and the slot in the table should be eligible for reuse.

My first plan was to use a cleanup function (finalizer) on the key that 
would remove itself from the hash table.  However, in the Boehm GC,
objects can only have one finalizer.  This means that any particular
object could only participate as a key in one weak hash table at a time,
and could not have any other finalizers.

Next I thought I would chain finalizers.  Before installing the 
remove-from-hash finalizer, I would retrieve the existing finalizer and
store it.  After the remove-from-hash finalizer was called, I would
replace (or call?) the previous finalizer.  However, this makes it
impossible in general to remove finalizers, and any other code that
attempts to do so will screw things up.  That means that every object
will maintain (until it is collected) a record of every hash table it
has ever been a key in.

So I thought I might layer my own finalization mechanism on top of the
one provided by the Boehm GC.  Mine would keep track of a list of
finalizers to invoke, and would support removing registered finalizers
from the list.  This seems like a lot of complexity to impose and
requires that the rest of the application use this layer rather than
the native finalization system.

Next I thought I might use weak pointers instead of finalizers.
However, that means that the reference to the value in the hash table
isn't removed until that hash table entry happens to be visited in
some way.  The table can be periodically "vacuumed" for dead keys
and the associated values detached, but this imposes a performance
penalty and makes me decide on tricky issues like a vacuuming schedule.
It also makes it very hard to reuse an existing (strong) hash table

Weak-key hash tables are a fairly common idea (and in fact a primitive
in some languages, I believe), so I presume there's some reasonable way
to implement them.  What am I missing?