[Gc] RE: how to prevent mark stack overflows in GC_push_other_roots ?

Dan Bonachea bonachea at cs.berkeley.edu
Thu Sep 15 12:07:45 PDT 2005

At 05:52 PM 9/12/2005, Boehm, Hans wrote:
>The logic behind this is:
>0) We have to be able to recover from running out of memory/swap space
>during marking.  If a process does run out of memory allocating small
>objects, this is often where the problem shows up.  We end up allocating
>all available memory, and then forcing a collection.  The collection has
>to deal with a larger heap than it has seen before, and thus often needs
>a larger mark stack.  But it can't be grown, because we just allocated
>all of the heap.  (There is some argument that a Unix process can't
>reliably recover from running out of memory anyway, and I think in
>practice that's true.  But based on experience, there seems to be a
>strong preference for 98% reliability over, say, 60% reliability in
>dealing with the situation.)

Understood - the design decision makes more sense now... You might consider 
adding this explanation to the "GC Algorithmic Overview" document...

>1) There are two methods for dealing with roots:
>         Eager: Scan the root section immediately, mark any objects
>referenced, and push the contents of the objects onto the mark stack if
>they haven't been previously marked.  It is hard to bound the number of
>entries we push onto the mark stack this way.  But an overflow is
>fundamentally OK, since it can only occur after we have marked objects,
>thus ensuring forward progress.
>         Lazy: Push the entire root section with GC_push_all, without
>marking anything.  Overflowing the mark stack while doing this many
>times is bad.  If we have to drop these and start over due to an
>overflow, we've made no progress.
>2) We first perform a bounded number of lazy pushes, and then do an
>unbounded number of eager pushes.  The lazy pushes can't possibly
>overflow the mark stack.  It is OK if the eager ones do, since we can
>handle that.  If GC_push_all ran into a mark stack overflow, it aborts,
>since something went seriously wrong, and we might otherwise end up in
>an infinite mark loop.
>In fact, we now do a bounded number of eager pushes early, as a result
>of eagerly marking part of the GCs own stack.  But I think that's still
>OK.  (Looking at the code, I think there is at least a theoretical issue
>with the way the IA64 register backing store is handled.  I'll have to
>reexamine that.)

I've replaced our use of GC_push_all_stack to push the region contents with 
GC_push_all_eager, and that appears to fix the mark stack overflows - so I 
guess that means I have a workable solution (thanks!).

>I gather that you can have an unbounded number of regions, or at least
>contiguous region sections?  Or you call GC_push_all_stack on each
>object?  The crashes occur during the GC_push_other_roots call itself?

Correct - we allow the amount of region allocated space to grow to the entire 
available machine memory, and they are allocated in groups of several pages at 
a time (or larger if a given huge object requires it). During a collection, 
our GC_push_other_roots callback walks through all the pages and calls the GC 
to push the contents of each page (ie we used to essentially call 
GC_push_all_stack(page_start, page_start+page_sz) on each page - now I make 
the same call to GC_push_all_eager instead.

>You might try calling GC_push_all_eager() instead, and making sure that
>this occurs after the stack contents are pushed.  Especially if you have
>a large number of pointers into the garbage-collected heap, or if these
>regions are large relative to the garbage collected heap and you are
>using the parallel collector, this may turn into a performance problem.
>But at least it should be correct.

I'd like to hear more about the performance problems you anticipate - we're 
not currently using the parallel collector, but it can be the case in some 
apps that the region heap grows much larger than the GC heap, and may contain 
many pointers to the GC heap. Are you worried about just the cost of the 
GC_push_all_eager calls to scan the regions, or is there another potential 
source of performance degradation?

One issue that occurs to me is that GC_push_all_eager provides no indication 
that the mark stack has overflowed, so in the current code if our region 
pusher overflows the mark stack while pushing region page 1 of 1000, the 
pushes of the other 999 pages are simply wasting time. Is there some clean way 
I could detect the mark stack has overflowed and simply cut-off the scan 
early, in the expectation that GC_push_other_roots will be called again before 
the end of this mark phase? (I believe one could look at the private state 
variables in the marker module to accomplish this, but ideally I'd like use an 
overflow detection mechanism would be something portable across future GC 
release versions - eg, in the same spirit of GC_mark_stack_empty(), provide a 

For reference, here's a typical crash stack we were encountering when using 
GC_push_all_stack() to push our regions:

unexpected mark stack overflow

Program received signal SIGABRT, Aborted.
0x001d3c32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) where
#0  0x001d3c32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00212989 in raise () from /lib/tls/libc.so.6
#2  0x00214342 in abort () from /lib/tls/libc.so.6
#3  0x086af945 in GC_abort (msg=0x0)
     at ../../../../src/runtime/gc-build/uniproc/../../gc/misc.c:1074
#4  0x086ad9be in GC_push_all (bottom=0x9ea1000 "", top=0x9ea2000 "")
     at ../../../../src/runtime/gc-build/uniproc/../../gc/mark.c:1205
#5  0x083b6c2c in ti_push_region_roots () at T6Region4lang2ti.c:394
#6  0x086ae93f in GC_mark_some (cold_gc_frame=0xbfecc8f4 "\002")
     at ../../../../src/runtime/gc-build/uniproc/../../gc/mark.c:391
#7  0x086a8dff in GC_stopped_mark (stop_func=0x86a8b00 <GC_never_stop_func>)
     at ../../../../src/runtime/gc-build/uniproc/../../gc/alloc.c:531
#8  0x086a9647 in GC_try_to_collect_inner (stop_func=0x86a8b00 
     at ../../../../src/runtime/gc-build/uniproc/../../gc/alloc.c:378
#9  0x086a9e5c in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0)
     at ../../../../src/runtime/gc-build/uniproc/../../gc/alloc.c:1036
#10 0x086a9fa9 in GC_allocobj (sz=2, kind=1)
     at ../../../../src/runtime/gc-build/uniproc/../../gc/alloc.c:1087
#11 0x086abf2f in GC_generic_malloc_inner (lb=4, k=1)
     at ../../../../src/runtime/gc-build/uniproc/../../gc/malloc.c:136
#12 0x086abff4 in GC_generic_malloc (lb=4, k=1)
     at ../../../../src/runtime/gc-build/uniproc/../../gc/malloc.c:192
#13 0x086ac262 in GC_malloc (lb=4)
     at ../../../../src/runtime/gc-build/uniproc/../../gc/malloc.c:311
#14 0x08409651 in m4mainPTAPT6String4lang4javamT7gccrash (a4args=0x99bce40)
     at tc-gen/T7gccrash.c:132
#15 0x0868e5eb in ti_main (argc=1, argv=0xbfeccc64) at tc-gen/ti-main.c:513
#16 0x0869b71f in thread_go (procinfo=0x99b7ea0)
     at ../../../../src/runtime/backend/sequential/main.c:713
#17 0x0869b7ce in spawn_threads () at 
#18 0x0869bad3 in real_main (argc=1, argv=0xbfeccc64, envp=0xbfeccc6c)
     at ../../../../src/runtime/backend/sequential/main.c:980
---Type <return> to continue, or q <return> to quit---
#19 0x0869b8aa in main (argc=1, argv=0xbfeccc64, envp=0xbfeccc6c)
     at ../../../../src/runtime/backend/sequential/main.c:875

Thanks for your help on this...

> > -----Original Message-----
> > From: Dan Bonachea [mailto:bonachea at cs.berkeley.edu]
> > Sent: Sunday, September 11, 2005 3:02 AM
> > To: Boehm, Hans; gc at napali.hpl.hp.com
> > Cc: Imran Haque
> > Subject: how to prevent mark stack overflows in GC_push_other_roots ?
> >
> >
> > Hi Hans - we're encountering crashes with "unexpected mark
> > stack overflow"
> > errors in some Titanium programs (we're currently using the
> > 6.5 collector),
> > and I'm trying to figure out the right way to solve them.
> >
> > The basic design issue is Titanium includes an alternate
> > memory allocation
> > system which provides region-based explicit memory
> > management. We acquire the
> > space for our regions as raw pages from GC_unix_get_mem(),
> > but otherwise the
> > collector doesn't know anything about the contents or
> > locations of our
> > regions. Objects in our regions are permitted to reference
> > objects in the GC
> > heap, so for correctness the collector needs to scan all
> > non-atomic objects in
> > our regions during a collection.
> >
> > We currently accomplish this by adding the contents of our
> > regions to the root
> > set using the GC_push_other_roots callback, in which we push
> > the entire
> > contents of all our memory regions containing non-atomic
> > objects using
> > GC_push_all_stack. The problem is in some applications our
> > regions can grow to
> > be quite large (think GBs and thousands of objects), and
> > GC_push_all_stack is
> > not robust against mark stack overflow - so we sometimes get
> > 'unexpected mark
> > stack overflow' crashes in the GC_push_all_stack call.
> >
> > Is there a safe way to push large amounts of data from within
> > GC_push_other_roots? I've combed the docs and sources and
> > haven't found much
> > other than this comment in mark_rts.c:
> >
> >      if (GC_push_other_roots != 0) (*GC_push_other_roots)();
> >          /* In the threads case, this also pushes thread stacks. */
> >          /* Note that without interior pointer recognition lots  */
> >          /* of stuff may have been pushed already, and this      */
> >          /* should be careful about mark stack overflows.        */
> >
> > but it doesn't provide any hint about how the callback should
> > actually go
> > about being "careful about mark stack overflows".
> >
> > Is there some analog to GC_push_all_stack I'm missing which
> > can correctly
> > handle mark stack overflow? Is there a better way to
> > accomplish what I need? I considered trying to predict the
> > possibility of overflow before calling
> > GC_push_all_stack and calling GC_signal_mark_stack_overflow,
> > but any heuristic
> > that guesses wrong even once will crash the program.
> > Similarly, tweaking
> > INITIAL_MARK_STACK_SIZE to a larger value only postpones the
> > crash until the
> > first time we hit that size.
> >
> > The situation seems analogous to having a large amount of
> > GC_malloc_uncollectable objects, but there are no mark bits
> > or other block
> > headers available on that uncollectable space so the same
> > code is not directly
> > applicable. Perhaps I should somehow plugin a callback in
> > GC_push_next_marked_uncollectable and advance my own pointer
> > through the
> > regions? (although I'd presumably also need callbacks to
> > reset the pointer on
> > an overflow or next GC).
> >
> > Any hints or ideas are appreciated...
> > Thanks,
> > Dan
> >
> > PS- Is there a good reason why the GC_push_* functions cannot simply
> > automatically grow and copy the mark stack on an overflow to
> > handle overflows
> > transparently? It seems like the collector currently goes to
> > significant
> > trouble to spread that overflow checking and growing logic
> > all over the mark
> > phase code, which seems a less robust design (so hopefully
> > there's a good
> > motivation for it?).
> >
> >
>Gc mailing list
>Gc at linux.hpl.hp.com

More information about the Gc mailing list