[Gc] Back to "GC Stack problem on Win32"

Hans Boehm Hans.Boehm at hp.com
Thu Dec 4 22:04:20 PST 2008


Ignore the fact that this doesn't update last_stack_min correctly.
I'll fix that part of the logic.

Hans

On Fri, 5 Dec 2008, Boehm, Hans wrote:

> How about something like this at the end of GC_push_stack_for?
>
> This will still repeatedly grow the stack for something like your example.  But, as in the last version, I think the total number of VirtualQuery calls is bounded by the final stack depth, plus one for each GC.
>
> Unlike my last version, this should typically avoid the GC_get_stack_min calls in your example.
>
> I'm not sure I want to introduce very ugly code to handle something like this example a bit better.  The collector already does work in proportion to the root size.  And this seems to be another example where a huge root size causes it to slow down.
>
> (Warning: completely untested:)
>
>     /* Set stack_min to the lowest address in the thread stack,        */
>      /* or to an address in the thread stack no larger than sp,        */
>      /* taking advantage of the old value to avoid slow traversals     */
>      /* of large stacks.                                               */
>      if (thread -> last_stack_min == ADDR_LIMIT) {
>        stack_min = GC_get_stack_min(thread -> stack_base);
>      } else {
>        if (sp < thread -> stack_base && sp >= thread -> last_stack_min) {
>            stack_min = sp;
>        } else {
> #         ifdef MSWINCE
>            stack_min = GC_get_stack_min(thread -> stack_base);
> #         else
>            if (GC_may_be_in_stack(thread -> last_stack_min)) {
>              stack_min = GC_get_stack_min(thread -> last_stack_min);
>            } else {
>              /* Stack shrunk?  Is this possible? */
>              stack_min = GC_get_stack_min(thread -> stack_base);
>            }
> #         endif
>        }
>      }
>      thread -> last_stack_min = stack_min;
>
>      if (sp >= stack_min && sp < thread->stack_base) {
> #       ifdef DEBUG_THREADS
>          GC_printf("Pushing thread from %p to %p for 0x%x from 0x%x\n",
>                    sp, thread -> stack_base, (int)thread -> id, (int)me);
> #       endif
>        GC_push_all_stack(sp, thread->stack_base);
>      } else {
>        /* If not current thread then it is possible for sp to point to */
>        /* the guarded (untouched yet) page just below the current      */
>        /* stack_min of the thread.                                     */
>        if (thread -> id == me || sp >= thread->stack_base
>                || sp + GC_page_size < stack_min)
>          WARN("Thread stack pointer 0x%lx out of range, pushing everything\n",
>                (unsigned long)(size_t)sp);
>        GC_push_all_stack(stack_min, thread->stack_base);
>      }
>    } /* thread looks live */
>
> Hans
>
>> -----Original Message-----
>> From: gc-bounces at napali.hpl.hp.com
>> [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of Ivan Maidanski
>> Sent: Thursday, November 13, 2008 3:08 PM
>> To: gc at napali.hpl.hp.com
>> Subject: Re[2]: [Gc] Back to "GC Stack problem on Win32"
>>
>> Hi!
>>
>> "Boehm, Hans" <hans.boehm at hp.com> wrote:
>>> Ivan -
>>>
>>> Thanks.  However, I don't think I understand the problem
>> here correctly. The old code should in nearly all cases only
>> invoke GC_may_be_in_stack(thread -> last_stack_min).  This
>> should be cheap, since it only walks a page or so of the stack, right?
>>
>> Right, but only if the stack hasn't grown too much between
>> collections.
>> By default, Win32 apps has StackCommit==4K. So, if the stack
>> grew by 4MB then VirtualQuery() would be called 1000 times
>> during nearest collection. But for GC_push_stack_for() only
>> one VirtualQuery() is really required - just to check sp value.
>>
>>>
>>> It seems like it would usually result in exactly one extra
>> VirtualQuery call, as it walks off the end of the stack.
>> Since GC_may_be_in_stack() makes one call anyway, it seems to
>> me that it should at most double the time spent there.
>>
>> Try this test app:
>>
>> #include "gc.h"
>>
>> void GC_printf();
>>
>> int f(int n) {
>>  return n > 0 ? f(n - 1) + 1 : 0;
>> }
>>
>> void test(char c, int n) {
>>  n = f(n);
>>  GC_printf("\n Test%c: N= %d\n\n", c, n);  GC_gcollect(); }
>>
>> void *obj;
>>
>> int main(void) {
>>  int n;
>>  int max = 9 * 1000 * 1000;
>>  obj = GC_MALLOC(16);
>>  for (n = 100 * 1000; n <= max; n += n >> 1) {
>>   test('A',n);
>>   test('B',n);
>>  }
>>  GC_printf("Done");
>>  return 0;
>> } // end
>>
>> Compile it with -fno-optimize-sibling-calls -Xlinker --stack
>> -Xlinker 0x10000000 Set GC_PRINT_STATS=1 to see the
>> world-stopped delays timing.
>>
>> If You use Your code or just comment out one line "if (sp <
>> stack_min || sp >= thread->stack_base)" in
>> GC_push_stack_for() then you see how much time is required to
>> collect just a few bytes.
>>
>>>
>>> I don't like the reference to last_info in the patch, since
>> that relies on side effects of GC_may_be_in_stack that I
>> would like to keep as a private implementation detail of
>> GC_may_be_in_stack and GC_get_stack_min.  Is there a reason
>> not to use thread -> last_stack_min instead of last_info.BaseAddress?
>>
>> I don't like it too. But to say the truth, "caching" here
>> works realy only to pass BaseAddress from
>> GC_may_be_in_stack() to GC_get_stack_min() even in case of
>> one thread. Another solution (without "caching") is to make
>> GC_may_be_in_stack() return BaseAddress or NULL (instead of
>> bool) - simple to change (but the func should be renamed, may be).
>>
>>>
>>> Hans
>>>
>>>> ...
>>>> I've already pointed out some bugs.
>>>> Now I've just run the same test (having Your patch
>> applied) as I had
>>>> run for my patch... And it turns out that Yours doesn't solve the
>>>> problem as it was originally stated (to say more precisely, it
>>>> doesn't reduce time for the first collection after stack growth).
>>>>
>>>> Look into the things You had advised me before:
>>>>> 1) Initially call VirtualQuery on the sp.  If the stack
>>>> base is in the same region, we know we're OK, and don't need
>>>> GC_get_stack_min.  Hopefully this will be true about 100%
>> of the time.
>>>>
>>>> So I did it for Your code now. It works.
>>>>
>>>> The patch is attached.
>>>>
>>>> Bye.
>>
>> Bye.
>>
>> _______________________________________________
>> Gc mailing list
>> Gc at linux.hpl.hp.com
>> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
>>
>
> _______________________________________________
> Gc mailing list
> Gc at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
>


More information about the Gc mailing list