[Gc] Cheaper finalisation and weak pointers

Petter Urkedal petter.urkedal at nordita.dk
Sun Mar 13 04:17:53 PST 2005

> I have to think more about the concurrency issues.  Currently
> I believe the finalization code does not run with the world stopped,
> though the allocation lock is held.  This makes finalization
> callbacks a bit easier, I think.  But dereferencing a weak
> pointer without holding the allocation lock seems quite
> tricky.  If you had mutable data structures, I'd be sure
> that it would be broken.

You are right; I assumed stopped-world in GC_finalize(), but from
GC_finish_collection() it is clearly not the case.  By the way, the
comment above GC_finalize() is then wrong.

Sorry to bring in more complexity, but I made another more intrusive
modification of the collector where finalisation does not require any
work in GC_finalize() at all.  I outline the idea at the end of the mail
in case you want to consider it before accepting the "mark completion
extension"; I am not in a hurry, post 7.0 is fine too.  But if you know
you will be busy with other things, a mark completion patch suits my
code well, too.

My last version of the mark completion patch is
https://www.nordita.dk/~petter/patches/gc6.4-ext.patch, with the

    typedef void (*GC_ext_hook_proc) GC_PROTO((void *));

    GC_API void GC_ext_register_inspector
		GC_PROTO((GC_ext_hook_proc proc, void *cd));

    GC_API void GC_ext_unregister_inspector
		GC_PROTO((GC_ext_hook_proc proc, void *cd));

    GC_API void GC_ext_register_finalizer
		GC_PROTO((GC_ext_hook_proc proc, void *cd));

    GC_API void GC_ext_unregister_finalizer
		GC_PROTO((GC_ext_hook_proc proc, void *cd));

    /* Utilitions for the Mark Inspectors */

    GC_API void GC_ext_mark_rec GC_PROTO((void *base_ptr));
    GC_API void GC_ext_mark_rec_ignore_self GC_PROTO((void *base_ptr));
    #define GC_ext_is_marked(ptr) GC_is_marked((void *)(ptr))
    #define GC_ext_set_mark_bit(ptr) GC_set_mark_bit((void *)(ptr))

    GC_API int GC_is_marked GC_PROTO((char *p));
    GC_API void GC_set_mark_bit GC_PROTO((char *p));

That is, I have split the callbacks and changed the interface to use
hash maps so that it looks more like the existing GC_register_*

> I'm not enthusiastic about the names for the callbacks.
> How about *complete_marking* and *inspect_marks*, or something
> like that?

Anything, I am struggling with the naming.  But, note that what I have
called "inspector" (called from GC_finalize) both inspects the marks and
completes the marking.  The other callback is called after the collector
has released the lock, i.e. when marks are no longer consistent.  It is
simply a callback to do some auxiliary work after a full collection is
finished.  So, would that be, GC_register_mark_completion_proc and
GC_register_postcollection_proc, *ay?*; I don't mind to have the names

Then, I move mark related functions to gc_mark.h, though notice that the
mark_rec functions are not suited for custom marking defined in that
header, since it doesn't schedule the work.

I will also drop the K&R overhead, as I see that gc7.0alpha doesn't
have it.



The idea of the alternative approach is to extend the object kind
facility with optional closures to be called before reclaim.  Then, make
two new object kinds, one for normal memory with finalisation and one
for atomic memory with finalisation.  When each block of memory is to be
reclaimed, the collector checks if the kind has finalisation, and if so,
finalises before linking onto the freelist.  The low level interface
allows registration of a generic finaliser on each kind, and this
finaliser locates the real finaliser stored on the object, and calls it.

Currently the code only provides Java finalisation.  I assume it can be
extended by letting the normal mark procedures read a per-block
instruction to mark from all objects in the block whether they
themselves are marked or not.  I will have to re-read the article about
the marking strategy to be sure this doesn't violate any of the

I think the advantages are

    1. The low-level interface should be directly usable to programming
    languages which stores type info (or a vtbl) on objects, whereas
    the high-level interface can be used for more general purposes
    (like "weak containers") to avoid proliferation of kinds.

    2. Marking finalisable objects are integrated into the normal mark
    procedures, and are thus fully incremental and parallel.

I see two issues:

    1. I minor issue is that the (low-level) generic finaliser has to
    distinguish objects on freelist from newly reclaimable objects
    before calling the object-specific finaliser.  My implementation of
    the high-level interface stores the finalisation closure on the
    first word of the object, and sets the LSB to distinguish it from
    freelist links that are stored int the same word.

    2. Currently these finalisers run with GC locked, thus no memory
    allocation is allowed.  The hardest problem if this constraint is
    lifted is to ascertain termination if the finaliser allocation of
    code from finalisable kinds.

Instead of finalising before reclaiming, the relevant blocks could be
scheduled to run finalisation in separate thread(s), and feed back the
blocks to the collector as they finish.  This would solve pt 2 above.
Some measure has to be taken, though, that the application does not
out-run the finalisation thread(s), thus causing excessive allocation of
new blocks.

Ad 1, to make the low-level interface more usable, we could maybe mark
and retain the contents of freelists for kinds which have finalisation.
I haven't looked at it.

I put a rudimentary implementation (java finalisation, no alloc in
finalisers) at https://www.nordita.dk/~petter/patches/gc6.4-for.patch.
The included benchmark gives with parallel mark enabled on a
hyper-threaded machine:

[parameters as in patch]
                       fin. ratio         time    time/fin.   VIRT RES
   regular finalize:       0.7500          2.7  6.86646e-06   5440 4212
finalize on reclaim:       0.7500         0.96  2.44141e-06   3096 1892
        no finalize:            0         0.43          n/a   3092 1888

[more heavy allocation]
                       fin. ratio         time    time/fin.   VIRT RES
   regular finalize:       0.8750        12.43  1.35476e-05   8828 7580
finalize on reclaim:       0.8750          4.6   5.0136e-06   4520 3300
        no finalize:            0         2.03          n/a   4524 3300

The VIRT and RES columns are memory usage according to 'top' when
running the test separately with only one kind of finalisation.  RES
should be the most interesting.  It appears that I got the extra word
for free here due to the geometric progression of sizes, but the memory
analysis is easy anyway (7 vs 1 vs 0 words overhead, rsp.).

More information about the Gc mailing list