[Gc] Re: Problems with GC_size_map
hans.boehm at hp.com
Tue Feb 9 16:30:47 PST 2010
I don't quite understand the difference in GC frequency. Towards the end of the 32-bit log, there seem to be about 3MB pointer-free + 0.5MB pointer-containingg live memory. It's about twice that in the 64-bit version. Assuming the data structures all contain pointer-sized elements, as probably makes sense for CL, that seems perfectly reasonable, and suggests that there is very little spurious retention of memory. The 32-bit version seems to be collecting about every 0.34MB. The 64-bit versions collects about every 1.2MB.
The "consed" and "gc count" numbers at the bottom appear to be off, though the 64-bit cons count seems to be in the right ball park? Roughly 4GB total were allocated? Based on the log, the 32-bit version surprisingly seems to allocate about the same number of bytes?
If you look at min_bytes_allocd(), given that the root size should be essentially zero, it should return roughly scan_size/2*fsd,
or scan_size/6 by default, where
scan_size = 2*composite + atomic/4 = about 1.75MB for 32 bits and 3.5MB for 64 bits.
Thus it seems to me that the 32-bit GC frequency is actually what was intended, assuming GC_free_space_divisor has its default value. I'm not sure why collections are less frequent in the 64-bit case, though you may want to adjust GC_free_space_divisor inboth cases, if you aren't already.
Looking at the code, the one thing that concerns me a bit here is the way stack_size is calculated in the single-threaded case in min_bytes_allocd, as
GC_stackbottom - (ptr_t)(&dummy)
This assumes that the collector is run from the main stack, which hopefully it is. It may be worth debugging GC_should_collect() to understand how it's arriving at the GC decision.
It seems to me that the 64-bit performance is actually quite good, at least assuming a single GC thread. It seems to be consing over a GB/sec, if I read things correctly. Given that the 32-bit version seems to be doing 4 times the work it also doesn't seem to be running horribly.
I think the heap sizes may be a bit misleading in both cases, since the heap seems to have large blocks at the end that weren't really touched.
My remaining questions would be:
0) Is GC_free_space_divisor defaulted in all cases?
1) Does 4GB total allocation look correct here?
2) Does the live data size look correct in both cases?
3) Why is GC_should_collect behaving so differently in the two cases, i.e. collecting 4 times as often in the 32-bit case instead of merely twice? Stepping through GC_should_collect and its callees should help determine that. The code is fairly simple.
From: Juan Jose Garcia-Ripoll [mailto:juanjose.garciaripoll at googlemail.com]
Sent: Sunday, February 07, 2010 2:31 PM
To: Boehm, Hans
Cc: gc at napali.hpl.hp.com
Subject: Re: [Gc] Re: Problems with GC_size_map
On Sun, Feb 7, 2010 at 10:13 PM, Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com<mailto:juanjose.garciaripoll at googlemail.com>> wrote:
On Sun, Feb 7, 2010 at 8:18 AM, Hans Boehm <Hans.Boehm at hp.com<mailto:Hans.Boehm at hp.com>> wrote:
On Sat, 6 Feb 2010, Juan Jose Garcia-Ripoll wrote:
The Common Lisp enviroment creates a number of constants at boot time. I
think those are the arrays you are seeing. However, those arrays are
never changed after creation. It was my understanding that thanks to
dirty bits and GC_enable_incremental() the cost of marking those arrays
would be close to zero.
They will still be traced during full collections, which probably
won't be that rare. But they shouldn't be a big deal. And their
presence should decrease GC frequency. Presumably non-nil entries
are actually pointers or small integers?
These arrays contain either pointers to live objects or NULL. All objects have been allocated by the garbage collector.
Since we don't see a blacklisting issue, it might be good to look
at GC_PRINT_STATS output, and compare to a platform on which it works
better. Or possibly compare heap contents on the two platforms.
But I am suspicious that we're chasing a problem that has already
been fixed in CVS.
I have built ECL with the garbage collector in a version from CVS. Numbers are actually worse without changing anything else.
I also did a comparison between 32-bit and 64-bit platforms with GC_print_stats = 1 and using the same version of the collector. The new, precise routine for marking has improved efficiency on the 64-bits platform, which now only takes a total of 2.9 seconds vs. 13.2 seconds in the 32-bit processor.
Among other differences between the logs
I can only spot one, which is that collections or marking seems to happen more often in the 32-bits system. For instance "arking for collection 16 after 287016 allocated bytes" vs. "Marking for collection 7 after 1094272 allocd bytes"
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gc