[Gc] gc with trie

skaller skaller at users.sourceforge.net
Tue Nov 21 18:07:13 PST 2006


On Tue, 2006-11-21 at 13:06 -0600, Boehm, Hans wrote:
> You seem to be asking several disjoint questions here? 
> 
> > -----Original Message-----
> > From: gc-bounces at napali.hpl.hp.com 
> > [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of skaller
> > Sent: Monday, November 13, 2006 7:26 PM
> > To: gc at napali.hpl.hp.com
> > Subject: [Gc] gc with trie
> > 
> > FYI I have some concern using gc since it seems to exclude 
> > use of a digital trie.
> You're talking about a trie that's indexed by addresses rather than
> character strings.  

Yes.

> That does cause issues if objects can move, as they
> can with some other collectors.  The standard solution is to assign each
> object a unique identifier as it's allocated, which is then used instead
> of its address.  Or you use something like Java hashcodes, and deal with
> the fact that multiple objects may have the same hashcode.

Neither is likely to work due to the cost: the primary purpose
using address based page tables is speed, and the application
I have in mind is itself a garbage collector: more likely to
reconsider recoding the lookup directly in asm than slow
it down with hashing.

However Judy arrays claim excellent performance: they're
a hybrid which carefully uses structures exactly the size
of cache lines.

At the moment I'm not using such a data structure: compatibility
with BRD-gc is one of the reasons .. my naive collector uses
a doubly linked list .. and all up has 6 word overhead per
allocation. This is ok for large frames, but for functional
data structures like lists it is too much.

> > Any hints on how to organise this?
> This is likely to be somewhat painful unless you have a priori
> constraints on pointers between the two heaps. 

Yes, but that isn't the problem here. Felix already has the
problem that, for example, you cannot put a Felix data type
like a functional list into a C++ STL container, because
the Felix collector will think it is unreachable.

This is the same problem any hybrid system has, including
for example Microsoft C++ running on .NET. Here Felix gc
plays the role the .NET collector would. In fact .. if
Felix were to run on .NET we'd probably let .NET be the
collector.

That's a different problem than say wishing to embed
Neko VM into Felix. Neko uses BRD-gc as its collector.
If a customer did that, on their head be any problem
interfacing created by them.

But if Felix uses a digital trie in its collector ..
the BRD-gc is going to delete reachable *Felix* objects.

This can't happen at the moment, because Felix uses
a doubly linked list root in an allocator object which
is on the stack -- so BRD-gc will find all Felix objects
are always reachable and leave them alone.

> Allocating one heap as a block in the other heap is generally not
> sufficient to deal correctly with cross-heap pointers. 

That's ok. See above. The client is responsible for
not crossing their pointers.

> If the whole system were under your control, I suspect it would be a lot
> easier to use a single collector that can be type accurate but can deal
> with ambiguities when necessary. 

Unfortunately the whole system isn't under my control.
Felix is a C++ extension. The Felix code and libraries
are under my control -- but the client can use arbitrary
C++ code directly in the language.

It's more an issue of not wanting to deal with
"it works in C++ why won't it work in Felix".

And wishing to allow some of the newer subsystems, like Neko,
which rely on BRD-gc to be embedded.

So basically, some way of telling the BRD-gc how to scan the 
digital trie might be a solution. Or simply forcing
Felix to suballocate BRD-gc blocks which are themselves
registered as BRD-gc roots might be enough.

To put this another way: "conservative gc" has hole in it
that needs plugging: it can't support certain very fast
data structures which hide pointers. In a digital trie
these pointers are certainly recoverable, so there
should be some way to teach the conservative gc how
to find them .. or at least tell it to keep out:
otherwise it loses the 'conservative' property, 
which is a disaster. Leaks aren't a nearly as much
of a problem as clobbering live data -- leaks only
affect long running programs.



-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


More information about the Gc mailing list