[Gc] Cheaper finalisation and weak pointers

Boehm, Hans hans.boehm at hp.com
Fri Mar 11 17:46:48 PST 2005


Thanks for the detailed explanation, and sticking with this.

I'm sorry I haven't had very much time to look at this.

I have to think more about the concurrency issues.  Currently
I believe the finalization code does not run with the world stopped,
though the allocation lock is held.  This makes finalization
callbacks a bit easier, I think.  But dereferencing a weak
pointer without holding the allocation lock seems quite
tricky.  If you had mutable data structures, I'd be sure
that it would be broken.

My inclination at this stage would be to put the hooks into
the collector, and keep the rest of the code separate.  I think
ideally I'd like two separate callbacks for the two places you need
them.  I'm not sure they need to be connected.  If a general facility
for lists of callbacks got implemented at the same time, that
would be even better.  Some of the root marking code could benefit.
I think I agree that it's better not to call it a plug-in.

I'm not enthusiastic about the names for the callbacks.
How about *complete_marking* and *inspect_marks*, or something
like that?

As far as reference to GC internals is concerned, I think the right
precedent here is gc_mark.h, which provides some access to marker
internals for new kinds of objects that know how to traverse
themselves.  If your plugins/callbacks don't need much
more than GC_is_marked(), I would just move the declaration there.
If you need things that don't fit there, it might make sense to add
a parallel file.

I would aim for getting this into gc7.0alphaX rather than gc6.X,
but I can deal with a gc6.X patch.

Hans

> -----Original Message-----
> From: gc-bounces at napali.hpl.hp.com 
> [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of Petter Urkedal
> Sent: Thursday, February 10, 2005 2:56 PM
> To: gc at napali.hpl.hp.com
> Subject: Re: [Gc] Cheaper finalisation and weak pointers
> 
> 
> First, I am sorry that two essential files, weakptr.c and 
> fiq.c, were missing from the previous patch, and 
> unfortunately also the changes to Makefile.am that would have 
> exposed the error when I testing it.  I attach a new patch.  
> (Run autoconf/automake.)
> 
> On 2005-02-09, Boehm, Hans wrote:
> > There seem to be a few applications like yours where 
> performance of a 
> > finalization-like mechanism matters.
> > 
> > A finalization plug-in interface generally sounds like a good idea, 
> > since I suspect there are many slightly different useful variants.
> 
> Yes, the finalisation queue is only a semi-general solution, 
> since it makes the assumptions that it needs for the urgent 
> inspection work, but then queues the rest to be done by 
> client code in an unrestricted environment.  (For instance, I 
> can imagine someone could get rid of the pointer in front of 
> the object pointer that the finalisation queue needs.  Cf. [1].)
> 
> The more general approach, I think, is to let "client code" 
> do mark-inspection itself; well, common client code should 
> maybe not do that, but it would be a convenient way to test 
> extensions outside the GC source tree and distribute/install 
> them separately until they pass some stability and 
> "popularity" testing.
> 
> I think what is needed is a hook for mark-inspection, and a 
> hook for running finalisation code, as my previous patch did. 
>  The latter is no problem.  The former, however, also implies 
> that some low-level interfaces must be exported, like 
> GC_is_marked and GC_MARK_FO.  It's maybe not an issue if 
> provided with proper warnings.  So,
> 
> 1. Shall I prepare a new patch with finalisation queues and 
> weak pointers stripped out?  If so,
> 
> 2. shall I selectively move the required low-level stuff from 
> private/*, or is it better to install the private/* headers 
> under $(includedir)/gc/private and let implementors of 
> extensions figure it out?
> 
> 3. Is it good to split the plug-in registration into two 
> functions, one for inspection and one for finalisation?
> 
> 4. I realise the word "plugin" may needlessly challenge 
> casual users (think Firefox, etc).  It may be better to say 
> (gc/gc_extension.h)
> 
>     GC_inspection_extension_handle
>     GC_extend_inspection(GC_inspection_extension_proc f, void *cd);
> 
>     GC_finalization_extension_handle
>     GC_extend_finalization(GC_finalization_extension_proc f, 
> void *cd);
> 
>     GC_drop_inspection(GC_inspection_extension_handle);
> 
>     GC_drop_finalization(GC_finalization_extension_handle);
> 
> or similar.
> 
> If this is the solution, the rest of this e-mail is less 
> important, but I will answer your questions anyway.  I also 
> speculate on possible parallelisation of the inspection 
> procedure at the end.
> 
> 
> > I'd like to understand a bit better where exactly the 
> trade-offs are 
> > here.
> > 
> > >     The solutions using the present GC functionality gives the
> > >     following overhead, not counting asymptotically negligible 
> > > parts:
> > > 
> > > 	disappearing links:		5 words extra, 3 fragments
> > > 	    hash-table links must be moved into separate memory
> > > 	    fragments (+ 1 word) and disappearing links cost 4 words.
> > Could you explain why hash-table links have to be moved to separate 
> > memory?
> 
> Given for instance some hashed pair constructor (ignoring 
> details about polymorphism), the most compact implementation 
> would be to put the hash table link directly into the objects:
> 
>     struct pair {
> 	GC_hidden_pointer hash_next;
> 	struct pair *left, *right;
>     };
> 
> To take notice of the disappearance of a '*pair' object, I 
> need to call 'GC_general_register_disappearing_link(link, 
> pair)', but on which 'link'? If 'link' is the '&hash_next' of 
> the previous object, then the following objects will also be 
> lost from the hash table.  I could let the collector clear 
> 'left' or 'right' just as a message to client code, but then 
> the object might be reclaimed before client code takes notice 
> and unlinks it. Thus, my conclusion is that a disappearing 
> link solution does not allow hash links to be inlined into 
> the hash-constructed objects, and I have
> 
>     struct hash_node {
> 	struct hash_node *next;
> 	GC_hidden_pointer obj; /* disappearing link */
>     };
>     struct pair {
> 	struct pair *left, *right;
>     };
> 
> for each pair (in addition to amortised 2 words for a 
> half-full hash table). The 'obj' link is the extra link, and 
> it applies also to my proposed weak pointers vs finalisation 
> queues.  An extra 1 word is not a big deal, but the extra 
> memory fragment per object, I think I care about, especially 
> if it is of a different size so that it is likely to be 
> allocated in a different page of memory.
> 
> > (I may misunderstand what that means.  I count 4 words in the GC's 
> > data structures, with one of those in the hash table array.)
> > 
> > It seems to me that you potentially win for a couple of different, 
> > separable, reasons:
> > 
> > 1) You avoid the GC's internal hash table, since you don't allow 
> > pointers to be unregistered.
> > 
> > 2) Since the linked list is in the object itself, you don't need a 
> > pointer to the object being tracked.
> > 
> > Does that sound right?
> 
> Precisely.  Also, I think 1) is conceptually sound if we 
> consider the weak pointer property of the link to be type 
> information, and thus constant throughout the object's life 
> time.  Re-registration is not needed when the pointer is assigned.
> 
> > Have you thought about the concurrency issues here?  
> Normally you need 
> > to lock on a "disappearing" pointer dereference, to deal 
> with the case 
> > of the GC
> > clearing the reference while you only have the hidden pointer in a
> > register.
> 
> My GC_weakptr_get was incorrect, thanks!  Now, I could do 
> locking, but consider adding a volatile to the struct
> 
>     typedef struct {
> 	GC_word next;
> 	volatile GC_word hptr;
>     } GC_weakptr;
> 
> and implementing GC_weakptr_get as
> 
>     GC_API void *GC_weakptr_get(GC_weakptr *wptr)
>     {
> 	void * volatile ptr = REVEAL_POINTER(wptr->hptr);
> 	if (wptr->hptr == HIDE_POINTER(0))
> 	    return 0;
> 	else
> 	    return ptr;
>     }
> 
> I think this should force the compiler to store the revealed 
> pointer to the 'ptr' stack variable before re-fetching 
> 'wptr->hptr'.  The C standard may not be very specific on the 
> volatile; my assumption is that the C compiler can not 
> commute operations on volatile locations.  At least gcc 
> produces correct code with '-O9' (".loc" and labels removed):
> 
> GC_weakptr_get:
>         pushl   %ebp
>         movl    %esp, %ebp
>         subl    $4, %esp
>         movl    8(%ebp), %edx       %edx = wptr
>         movl    4(%edx), %ecx
>         notl    %ecx
>         movl    %ecx, -4(%ebp)      Store revealed pointer to ptr...
>         movl    4(%edx), %ecx       ...before re-fetch from wptr->hptr
>         movl    -4(%ebp), %eax
>         incl    %ecx
>         sete    %dl
>         movl    %ebp, %esp
>         popl    %ebp
>         movzbl  %dl, %ecx
>         decl    %ecx
>         andl    %ecx, %eax          (Heh, nice trick to avoid jump!)
>         ret
> 
> If we are uncertain, I can add an option in 'configure.in' to 
> --disable-volatile-assumptions and LOCK() instead.
> 
> (BTW, I think neither _WeakPointer_Equal nor 
> _WeakPointer_Hash in gcc_support.c need to call with 
> allocation lock, as they don't dereference or return the 
> object pointers.)
> 
> > What locking discipline do you need?
>  
> My current code assumes that word-size stores are atomic.  
> The weak pointers utilises the separation of data between 
> mark-inspection code and client code as much as possible, and 
> uses atomic operations.  The finalisation queues uses LOCK() 
> and UNLOCK(), which can be rewritten as 
> GC_call_with_alloc_lock if moved out of the GC tree.  I think 
> I could change it to use atomic operations only; I had that 
> in mind when I wrote it.  However, see [1] for an approach 
> that would need locking.
> 
> If mark-inspectors have to run with application threads 
> stopped, then using GC's lock seems to be the easiest way to 
> avoid dead-locking.  Else the stopped threads may hold locks 
> on containers which are needed by the inspector.  More 
> granular locking could be implemented, if (1) the client 
> could register additional mutices that GC must claim before a 
> full GC, or (2) if GC can start threads but still hold 
> allocation lock before starting mark-inspectors.  In both 
> cases, extensions must make sure not to allocate memory while 
> holding a lock required by the inspection code.  Is that 
> analysis sound?
> 
> The mark-inspector codes can not run parallel as they are 
> now, but I think that would be nice.  If it is possible from 
> GC's point of view to run mark-inspectors in parallel, it 
> would also be nice to have inter-inspector mutices working.  
> This is not strictly necessary since the client could set up 
> several inspectors to work on disjoint sets of objects.  
> Parallelisation would not prevent delays for code that uses 
> heavy inspection, but at least it would utilise all CPUs (in 
> the upcoming cheap multi-core machines :-))
> 
> Continuing on the ambitious side, can applications run 
> concurrently with mark-inspection as long as there are 
> thread-local free-lists available? Then, given a proper 
> scaling of local alloc pools, even delays could be 
> eliminated.  I don't mind GC delays in my own code, and maybe 
> apps which need some real-time behaviour better be advised 
> not to use mark-inspection-intensive techniques.
> 
> Sincerely,
> Petter
> 
> 
> A FOOTNOTE
> 
> [1]  My first attempt at finalisation had no extra pointer to 
> queue the finalisable objects.  Instead, the client provided 
> a function to iterate over a container and unlink unmarked 
> nodes directly.  Thus the GC memory overhead per node 
> approached zero for large containers.  However, I could not 
> find a good solution for locking the container between 
> application-phase and the GC post-mark phase without risking 
> deadlocks. My solution was to separate data needed by the two 
> worlds, thus finalisation queueing.  I realise that I could 
> probably have used GC_call_with_alloc_lock in client code, 
> but finalisation queueing also has the advantage of 
> minimising work during GC.
> 


More information about the Gc mailing list