[Gc] how to prevent mark stack overflows in GC_push_other_roots ?

Dan Bonachea bonachea at cs.berkeley.edu
Sun Sep 11 03:01:49 PDT 2005

Hi Hans - we're encountering crashes with "unexpected mark stack overflow" 
errors in some Titanium programs (we're currently using the 6.5 collector), 
and I'm trying to figure out the right way to solve them.

The basic design issue is Titanium includes an alternate memory allocation 
system which provides region-based explicit memory management. We acquire the 
space for our regions as raw pages from GC_unix_get_mem(), but otherwise the 
collector doesn't know anything about the contents or locations of our 
regions. Objects in our regions are permitted to reference objects in the GC 
heap, so for correctness the collector needs to scan all non-atomic objects in 
our regions during a collection.

We currently accomplish this by adding the contents of our regions to the root 
set using the GC_push_other_roots callback, in which we push the entire 
contents of all our memory regions containing non-atomic objects using 
GC_push_all_stack. The problem is in some applications our regions can grow to 
be quite large (think GBs and thousands of objects), and GC_push_all_stack is 
not robust against mark stack overflow - so we sometimes get 'unexpected mark 
stack overflow' crashes in the GC_push_all_stack call.

Is there a safe way to push large amounts of data from within 
GC_push_other_roots? I've combed the docs and sources and haven't found much 
other than this comment in mark_rts.c:

     if (GC_push_other_roots != 0) (*GC_push_other_roots)();
         /* In the threads case, this also pushes thread stacks. */
         /* Note that without interior pointer recognition lots  */
         /* of stuff may have been pushed already, and this      */
         /* should be careful about mark stack overflows.        */

but it doesn't provide any hint about how the callback should actually go 
about being "careful about mark stack overflows".

Is there some analog to GC_push_all_stack I'm missing which can correctly 
handle mark stack overflow? Is there a better way to accomplish what I need?
I considered trying to predict the possibility of overflow before calling 
GC_push_all_stack and calling GC_signal_mark_stack_overflow, but any heuristic 
that guesses wrong even once will crash the program. Similarly, tweaking 
INITIAL_MARK_STACK_SIZE to a larger value only postpones the crash until the 
first time we hit that size.

The situation seems analogous to having a large amount of 
GC_malloc_uncollectable objects, but there are no mark bits or other block 
headers available on that uncollectable space so the same code is not directly 
applicable. Perhaps I should somehow plugin a callback in 
GC_push_next_marked_uncollectable and advance my own pointer through the 
regions? (although I'd presumably also need callbacks to reset the pointer on 
an overflow or next GC).

Any hints or ideas are appreciated...

PS- Is there a good reason why the GC_push_* functions cannot simply 
automatically grow and copy the mark stack on an overflow to handle overflows 
transparently? It seems like the collector currently goes to significant 
trouble to spread that overflow checking and growing logic all over the mark 
phase code, which seems a less robust design (so hopefully there's a good 
motivation for it?). 

More information about the Gc mailing list