[Gc] Re: Add a new GC pointer tracing mode

Zach Saw zach.saw at gmail.com
Wed Nov 7 15:12:45 PST 2012


Hi Ivan,

It's really hard to compare the two as architecturally mine is quite
different to BoehmGC.

However, if we base the discussion on the impact of the changes alone, my
own GC has had the following impacts:
1) We lose some (very minimal) performance on pointer assignment and object
access. However, any decent C++ compiler will optimize out the object
access portion by only creating one instance of the object returned by ->
for multiple accesses to it. As for pointer assignments, the compiler would
have to generate somewhat similar code anyway -- what we are doing is
simply telling it what code to generate for pointer assignment. For
architectures like newer gens x86, the hardware does a pretty good job of
hiding the inefficiencies.
2) We gain performance during mark phase by not retaining (hence retracing)
false pointers.
3) Less false retentions => more memory freed up.
4) The additional step of synchronizing mutator code during stop the world
is really non measurable compared to the mark phase.

It should be a mode that is only enabled by compiler defines as it requires
that a corresponding C++ template class gc_ptr to work in tandem with it.
Enabling it also means that you do lose / gain some performance depending
on what you do but empirically with my own GC, memory management + pointer
manipulation combined are still faster than if one were to employ
boost::shared_ptr if physical memory is a non constraint (as with any GC vs
manual memory management comparisons).

IOW, performance is generally a non-issue.   In fact, since
boost::shared_ptr uses a locked instruction per pointer assignment, it can
be up to 16x slower than my gc_ptr in some real world use cases.

Zach

On Thursday, November 8, 2012, Ivan Maidanski wrote:

> Hi Zach,
>
> 1. I do CC to the mailing list (as you are going to contribute, it's good
> to let the community participate in the discussion).
> 2. Have you compared speed and world-stop pause of your app equipped with
> your GC and BoehmGC? I mean what is the practical benefit of nearly precise
> marking (taking into account slower pointers manipulation)?
>
> Thank you. We of course are open for discussion and improvement of GC.
>
> Regards,
> Ivan
>
> Wed, 7 Nov 2012 12:44:19 +1100 от Zach Saw <zach.saw at gmail.com<javascript:_e({}, 'cvml', 'zach.saw at gmail.com');>
> >:
>
>  Hi guys,
>
> Thanks for your reply Hans.
>
> Let me expand on my original email.
>
> This mode relies on C++'s ability to mark the memory with the magic
> number. Managed objects are only accessible via special template classes
> (very much like boost::shared_ptr -- let's call it gc_ptr<type>) and we do
> not allow dereferencing / getting the raw pointer without going through
> functions like GcPin. Each time the template class is created or its
> managed object's member vars/funcs accessed via the '->' operator, we mark
> the memory adjacent to the object pointer with this magic number. What this
> ensures is -- if the mutator creates a gc_ptr<type> on the stack,
> essentially a magic number will appear on the stack. If they create
> gc_ptr<type> on static memory area, this static memory area too gets marked
> with a magic number.
>
> As such, we can be safely say that all the root pointers will be traceable
> through the stack and static memory area (which in C++ will be in the stack
> of the main thread) via gc_ptr only. Even if valid pointers reside in
> registers or pointer temporaries in the stack, we don't and shouldn't care
> (i.e. tracing these would end up causing false object retentions). This is
> essentially the same as the .NET framework's GC maintaining a separate
> stack per thread to tell the GC which objects are being pointed to from the
> stack. In C++, we do not have the luxury of maintaining a separate stack
> (we could but it was more costly than this feature I'm requesting in my
> tests).
>
> There is another thing I do in my own GC I forgot to mention -- when we
> have just stopped the world, we need to make sure mutator isn't assigning
> 'gc_ptr<type> bar' with 'gc_ptr<type> foo'. This is the only case where
> registers / pointer temporaries are of interest to us in this mode.
>
> In this case where:
>
> a) the compiler puts 'gc_ptr<type> bar's pointer into a register before
> writing it to gc_ptr<type> foo
> b) context switch kicks in
> c) another thread overrides bar's pointer
> d) GC kicks in
>
> What happens is no one now sees the original pointer about to be assigned
> to the gc_ptr<type> foo because we no longer scans the registers.
>
> I see several ways of solving this problem:
>
> 1) As I did in my GC, I provide a macro to assign the pointers. This macro
> is simply inlined assembly to ensure assignment always occurs using a
> predefined asm instructions which I can then easily check when I stop the
> world. During which, if the mutator is doing pointer assignment, I sync the
> mutator by executing the asm opcodes implicitly and adjust its instruction
> pointer appropriately. We do need to catch all exceptions in case the
> assignment fails and ignore it, continuing to the next mutator thread.
> Alternately, if we do not want to implicitly execute the instructions, we
> could do what .NET's GC does -- override the instructions right after the
> assignment opcodes with a jmp to our code that reinstates the original
> instructions and then suspends its own thread. This gets rid of the case I
> explained above.
>
> 2) We ignore this case completely as it means the mutator is using gc_ptr
> out of its thread-safety guarantees. We could make gc_ptr only guarantee
> thread-safety in assignment, not itself being thread-safe when written/read
> by multiple threads.
>
> I chose (1) in my implementation as it's the closest to a raw pointer's
> behavior and it is how .NET framework works. The second option also makes
> it harder for users of the GC library to debug their code in case they use
> the gc_ptr in a non-threadsafe manner (most likely they'll find objects
> getting deleted outside of their expectations). In case you're wondering,
> the performance penalty for using a predefined assembly instructions for
> assignment is hardly felt under x86 where the hardware does register
> renaming (internally the CPU has a lot more registers). If it ends up
> causing another fetch from memory where the pointer value is already in a
> register, it is not too bad either -- the value would be already loaded
> into the closest CPU cache.
>
> Also to expand on my original point where I asked not to sweep an
> untraceable object unless the mutator tells the GC it is now ready.
>
> The reason I wanted this was simply because right after creation of a new
> object, we have yet to assign to a gc_ptr. If GC kicks in at this juncture,
> this newly created object would be swept. Asking the mutator to tell the GC
> that it has now been assigned to a gc_ptr (hence tracable from root) and is
> ready to be swept when untraceable is well within the capabilities of C++.
>
> I hope this clears things up a little. I see us going back and forth with
> questions and answers before I can fully explain what I've done in my own
> GC and how we bring that across.
>
> Thanks again.
>
> Regards,
> Zach
>
>
> On Wed, Nov 7, 2012 at 10:58 AM, Boehm, Hans <hans.boehm at hp.com<https://e.mail.ru/cgi-bin/sentmsg?mailto=mailto%3ahans.boehm@hp.com>
> > wrote:
>
>  Zack -****
>
> ** **
>
> Ivan Maidanski (ivmai at mail.ru<https://e.mail.ru/cgi-bin/sentmsg?mailto=mailto%3aivmai@mail.ru>)
> is now doing most of the collector maintenance and patch approval.  Please
> also copy him or the gc at linux.hpl.hp.com<https://e.mail.ru/cgi-bin/sentmsg?mailto=mailto%3agc@linux.hpl.hp.com>mailing list.
> ****
>
> ** **
>
> I do not understand the correctness argument here.  You're assuming an
> entirely custom code generator?  How do you ensure that essential pointers
> don't reside in, for example, machine registers?  If the ABI supports
> callee-save registers, some of the callers pointer temporaries may live in
> registers while the callee is running.   If the callee triggers a GC, how
> do I find those pointer temporaries?****
>
> ** **
>
> What if there are multiple threads running and a thread is stopped while
> important temporaries are stored in registers?****
>
> ** **
>
> Hans****
>
> ** **
>
> *From:* Zach Saw [mailto:zach.saw at gmail.com<https://e.mail.ru/cgi-bin/sentmsg?mailto=mailto%3azach.saw@gmail.com>]
>
> *Sent:* Friday, November 02, 2012 5:18 PM
> *To:* Boehm, Hans
> *Subject:* Add a new GC pointer tracing mode****
>
> ** **
>
> Hi,****
>
> ** **
>
> I'd like to request a new feature for the GC (please let me know if you are
> interested in implementing this).****
>
> ** **
>
> This is a feature that would be useful to implement a nearly precise stack trace.
> Using this, we do not have to trace CPU registers for possible pointers.****
>
> ** **
>
> Note that I've implemented this for my own nearly precise GC a few years ago but
> unfortunately I've only coded it for bcc32 and x86 -- it's therefore not portable.
> In my experience (my GC runs 24-7 in a production environment), the nearly-precise
>
>
> ****
>
> stack tracing method is a lot better at not retaining out of scope pointers than
> the conservative one employed by BoehmGC. Out of scope pointers are extremely rare
> compared to BoehmGC especially for C++. By out of scope pointers, I mean pointers
>
>
> ****
>
> that linger in the stack and have not been overridden by other info but is already
> out of scope. For example, the stack may shrink and grow again, but the pointers
> from the previous cycle may still be on the stack if it has not been overridden
>
>
> ****
>
> with new data.
>
> Here's what needs to be done for the nearly precise mode:****
>
> 1) Don't trace CPU registers****
>
> 2) Tracing stack: treat pointers as possibly valid only if the next GC_word is****
>
> equals to a magic number (choose an arbitrary number - preferably one that ends****
>
> with an odd number to ensure it won't traced as another pointer)****
>
> 3) When allocating GC memory, the memory will never get garbage collected ****
>
> during the sweep phase, **unless** we tell the GC that it's now ready to be ****
>
> collected when untraceable. I haven't looked into your GC code yet but in my ****
>
> own implementation, it's merely a single bit in the GC object descriptor.****
>
> ** **
>
> These tweaks should only be enabled via ****
>
> GC_set_enable_nearly_precise_stack_tracing.****
>
> ** **
>
> We also want to be able to tell the GC what that magic number is (e.g. ****
>
> GC_set_nearly_precise_stack_trace_magic_number) and another function for ****
>
> setting ready-to-be-swept flag -- ****
>
> GC_set_nearly_precise_stack_trace_memory_ready.****
>
> ** **
>
> These features, combined with the stack unwinding feature of C++ allows us to
> achieve nearly-precise stack tracing. At the worst case, it will retain as much
> false pointers as the conservative mode, but in any typical real world usage, it
>
>
> ****
>
> has proven to be equivalent to a fully accurate GC.
>
> Thank you in advance.
>
> Regards,
> Zach
>
> p.s. I have already started patching the BoehmGC source code and it will not be too
> hard to implement the changes I outlined. However, I'd like it be added to the
>
>
> ****
>
> official source and perhaps have someone verify the patches.****
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://napali.hpl.hp.com/pipermail/gc/attachments/20121108/87c55cb5/attachment.htm


More information about the Gc mailing list