[Gc] Re: Add a new GC pointer tracing mode
hans.boehm at hp.com
Thu Nov 8 10:30:08 PST 2012
I now almost see how this can work for a single thread. But I’m still not sure how to make it reliable with multiple threads and a conventional compiler. It seems to me that you would have to make both the pointer and the magic number in a gc_ptr atomic, or possibly volatile, to prevent the compiler from e.g. delaying the assignment to the magic number, or optimizing it out completely as dead code.
I also don’t think there is anything preventing a compiler from register allocating the two halves of a gc_ptr whose address is never taken. It sounds to me like that would still break even single-threaded code.
I’m much less concerned about handling code that has data races on a gc_ptr. It seems to me the trick is to prevent the compiler from breaking data-race-free code.
From: gc-bounces at linux.hpl.hp.com [mailto:gc-bounces at linux.hpl.hp.com] On Behalf Of Ivan Maidanski
Sent: Wednesday, November 07, 2012 6:52 AM
To: Zach Saw
Cc: Boehm GC
Subject: [Gc] Re: Add a new GC pointer tracing mode
1. I do CC to the mailing list (as you are going to contribute, it's good to let the community participate in the discussion).
2. Have you compared speed and world-stop pause of your app equipped with your GC and BoehmGC? I mean what is the practical benefit of nearly precise marking (taking into account slower pointers manipulation)?
Thank you. We of course are open for discussion and improvement of GC.
Wed, 7 Nov 2012 12:44:19 +1100 от Zach Saw <zach.saw at gmail.com<mailto:zach.saw at gmail.com>>:
Thanks for your reply Hans.
Let me expand on my original email.
This mode relies on C++'s ability to mark the memory with the magic number. Managed objects are only accessible via special template classes (very much like boost::shared_ptr -- let's call it gc_ptr<type>) and we do not allow dereferencing / getting the raw pointer without going through functions like GcPin. Each time the template class is created or its managed object's member vars/funcs accessed via the '->' operator, we mark the memory adjacent to the object pointer with this magic number. What this ensures is -- if the mutator creates a gc_ptr<type> on the stack, essentially a magic number will appear on the stack. If they create gc_ptr<type> on static memory area, this static memory area too gets marked with a magic number.
As such, we can be safely say that all the root pointers will be traceable through the stack and static memory area (which in C++ will be in the stack of the main thread) via gc_ptr only. Even if valid pointers reside in registers or pointer temporaries in the stack, we don't and shouldn't care (i.e. tracing these would end up causing false object retentions). This is essentially the same as the .NET framework's GC maintaining a separate stack per thread to tell the GC which objects are being pointed to from the stack. In C++, we do not have the luxury of maintaining a separate stack (we could but it was more costly than this feature I'm requesting in my tests).
There is another thing I do in my own GC I forgot to mention -- when we have just stopped the world, we need to make sure mutator isn't assigning 'gc_ptr<type> bar' with 'gc_ptr<type> foo'. This is the only case where registers / pointer temporaries are of interest to us in this mode.
In this case where:
a) the compiler puts 'gc_ptr<type> bar's pointer into a register before writing it to gc_ptr<type> foo
b) context switch kicks in
c) another thread overrides bar's pointer
d) GC kicks in
What happens is no one now sees the original pointer about to be assigned to the gc_ptr<type> foo because we no longer scans the registers.
I see several ways of solving this problem:
1) As I did in my GC, I provide a macro to assign the pointers. This macro is simply inlined assembly to ensure assignment always occurs using a predefined asm instructions which I can then easily check when I stop the world. During which, if the mutator is doing pointer assignment, I sync the mutator by executing the asm opcodes implicitly and adjust its instruction pointer appropriately. We do need to catch all exceptions in case the assignment fails and ignore it, continuing to the next mutator thread. Alternately, if we do not want to implicitly execute the instructions, we could do what .NET's GC does -- override the instructions right after the assignment opcodes with a jmp to our code that reinstates the original instructions and then suspends its own thread. This gets rid of the case I explained above.
2) We ignore this case completely as it means the mutator is using gc_ptr out of its thread-safety guarantees. We could make gc_ptr only guarantee thread-safety in assignment, not itself being thread-safe when written/read by multiple threads.
I chose (1) in my implementation as it's the closest to a raw pointer's behavior and it is how .NET framework works. The second option also makes it harder for users of the GC library to debug their code in case they use the gc_ptr in a non-threadsafe manner (most likely they'll find objects getting deleted outside of their expectations). In case you're wondering, the performance penalty for using a predefined assembly instructions for assignment is hardly felt under x86 where the hardware does register renaming (internally the CPU has a lot more registers). If it ends up causing another fetch from memory where the pointer value is already in a register, it is not too bad either -- the value would be already loaded into the closest CPU cache.
Also to expand on my original point where I asked not to sweep an untraceable object unless the mutator tells the GC it is now ready.
The reason I wanted this was simply because right after creation of a new object, we have yet to assign to a gc_ptr. If GC kicks in at this juncture, this newly created object would be swept. Asking the mutator to tell the GC that it has now been assigned to a gc_ptr (hence tracable from root) and is ready to be swept when untraceable is well within the capabilities of C++.
I hope this clears things up a little. I see us going back and forth with questions and answers before I can fully explain what I've done in my own GC and how we bring that across.
On Wed, Nov 7, 2012 at 10:58 AM, Boehm, Hans <hans.boehm at hp.com<sentmsg?mailto=mailto%3ahans.boehm at hp.com>> wrote:
Ivan Maidanski (ivmai at mail.ru<sentmsg?mailto=mailto%3aivmai at mail.ru>) is now doing most of the collector maintenance and patch approval. Please also copy him or the gc at linux.hpl.hp.com<sentmsg?mailto=mailto%3agc at linux.hpl.hp.com> mailing list.
I do not understand the correctness argument here. You’re assuming an entirely custom code generator? How do you ensure that essential pointers don’t reside in, for example, machine registers? If the ABI supports callee-save registers, some of the callers pointer temporaries may live in registers while the callee is running. If the callee triggers a GC, how do I find those pointer temporaries?
What if there are multiple threads running and a thread is stopped while important temporaries are stored in registers?
From: Zach Saw [mailto:zach.saw at gmail.com<sentmsg?mailto=mailto%3azach.saw at gmail.com>]
Sent: Friday, November 02, 2012 5:18 PM
To: Boehm, Hans
Subject: Add a new GC pointer tracing mode
I'd like to request a new feature for the GC (please let me know if you are
interested in implementing this).
This is a feature that would be useful to implement a nearly precise stack trace.
Using this, we do not have to trace CPU registers for possible pointers.
Note that I've implemented this for my own nearly precise GC a few years ago but
unfortunately I've only coded it for bcc32 and x86 -- it's therefore not portable.
In my experience (my GC runs 24-7 in a production environment), the nearly-precise
stack tracing method is a lot better at not retaining out of scope pointers than
the conservative one employed by BoehmGC. Out of scope pointers are extremely rare
compared to BoehmGC especially for C++. By out of scope pointers, I mean pointers
that linger in the stack and have not been overridden by other info but is already
out of scope. For example, the stack may shrink and grow again, but the pointers
from the previous cycle may still be on the stack if it has not been overridden
with new data.
Here's what needs to be done for the nearly precise mode:
1) Don't trace CPU registers
2) Tracing stack: treat pointers as possibly valid only if the next GC_word is
equals to a magic number (choose an arbitrary number - preferably one that ends
with an odd number to ensure it won't traced as another pointer)
3) When allocating GC memory, the memory will never get garbage collected
during the sweep phase, *unless* we tell the GC that it's now ready to be
collected when untraceable. I haven't looked into your GC code yet but in my
own implementation, it's merely a single bit in the GC object descriptor.
These tweaks should only be enabled via
We also want to be able to tell the GC what that magic number is (e.g.
GC_set_nearly_precise_stack_trace_magic_number) and another function for
setting ready-to-be-swept flag --
These features, combined with the stack unwinding feature of C++ allows us to
achieve nearly-precise stack tracing. At the worst case, it will retain as much
false pointers as the conservative mode, but in any typical real world usage, it
has proven to be equivalent to a fully accurate GC.
Thank you in advance.
p.s. I have already started patching the BoehmGC source code and it will not be too
hard to implement the changes I outlined. However, I'd like it be added to the
official source and perhaps have someone verify the patches.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gc