[Gc] stack problem
hans.boehm at hp.com
Tue Sep 30 14:42:44 PDT 2008
> -----Original Message-----
> From: Ivan Maidanski [mailto:ivmai at mail.ru]
> Sent: Tuesday, September 30, 2008 1:28 PM
> To: Romano Tenca
> Cc: Boehm, Hans
> Subject: Re: [Gc] stack problem
> > I can confirm that the size of the stack and not only the number of
> > recursion is important.
> > When only the size of the stack increases, time slow down:
> > 3 MB stack
> > ...
> I don't agree! Only the current committed stack size matters!
> I tried to compile my sample (with mingw) setting 1MB, 70MB
> and 700MB stack sizes and GC time was always the same (for a
> given size of current committed stack - i.e. number of
> iterations) with the precision of 1ms.
> By profiling the code it comes out that GC_get_stack_min()
> eats 99.7% of the execution time.
> So, as a TEMPORAL (and totally WRONG) workaround replace
> "MSWINCE" with "MSWIN32" on line 804 of win32_threads.c (from
> CVS), and, then recompile with Your sample - if it works as
> normal then the timing would be Ok.
> At present, I don't know what's wrong with GC_get_stack_min()
> and how to fix it.
GC_get_stack_min() walks the stack towards lower addresses, possibly one page at a time, looking for the lowest (hottest) committed address in the stack. It calls VirtualQuery for each page, which I would guess is where the time goes. The result is used as a plausibility check on the stack pointer. If the sp fails the plausibility check (hopefully very rare), GC_get_stack_min() is used instead of the sp for stack scanning. I suspect this code was there to deal with assembly code that didn't correctly mainatin the stack pointer, though this seems like the kind of code you really want to rely on as little as possible.
I can see two possible solutions, other than throwing out the sanity check:
1) Initially call VirtualQuery on the sp. If the stack base is in the same region, we know we're OK, and don't need GC_get_stack_min. Hopefully this will be true about 100% of the time.
2) Cache the last result of GC_get_stack_min() for each thread, and start the next search with the previous value.
I'd be inclined to try (1) first. Whether that works well will depend on how large a region is returned by VirtualQuery.
More information about the Gc