[Gc] Re: Performance of bdwgc7.2 had degraded compared to 6.8 - the patch to test

Ludovic Courtès ludo at gnu.org
Thu Dec 2 12:53:09 PST 2010


Hi Ivan,

Ivan Maidanski <ivmai at mail.ru> writes:

> It seems the observed degradation can be discovered by 2 tests:
> 1) by benchmarking v71 vs v72a2+test2_patch;
> 2) by benchmarking v71 vs v72a2+test3_patch.
>
> test2 patch reverts the relevant changes of:
> 2008-08-21  Hans Boehm <Hans.Boehm at hp.com>
>
> test3 patch reverts the relevant changes of:
> 2009-05-22  Hans Boehm <Hans.Boehm at hp.com> (Largely from Ludovic Cortes)

Damn, I feel guilty now.  ;-)

Did you measure the effect of each patch individually?  It would be
interesting to know.

For the record, the discussion that led to the second patch started here:

  http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2570

The next-to-final patch was posted here:

  http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2634

The intent was to /exclude/ ELF sections containing relocated read-only
data from the GC roots on GNU systems, thereby reducing the amount of
memory that needs to be scanned.

The effect on libguile was discussed here:

  http://thread.gmane.org/gmane.lisp.guile.devel/9247

In this message, I wrote:

  However, when not linking with `-z relro', static allocation leads to
  slightly degraded performance and increased heap usage (perhaps due to
  misidentified pointers in the `.data.rel.ro' section?).  This is
  probably worth some investigation on the BDW-GC side.

So, ahem, I feel twice as guilty now...

Could the problem be caused by the search for LOAD segments when a
PT_GNU_RELRO is encountered, in GC_register_dynlib_callback?  The code
does:

--8<---------------cut here---------------start------------->8---
  for( i = 0; i < (int)(info->dlpi_phnum); ((i++),(p++)) ) {
    switch( p->p_type ) {
        case PT_GNU_RELRO:
        /* This entry is known to be constant and will eventually be remapped
           read-only.  However, the address range covered by this entry is
           typically a subset of a previously encountered `LOAD' segment, so
           we need to exclude it.  */
        {
            for (j = n_load_segs; --j >= 0; ) {
--8<---------------cut here---------------end--------------->8---

It does look quadratic to me.

However, it would only impact initialization time, which is probably
negligible on long-running programs (such as the Bigloo benchmarks, I
suppose).

Hmm, OTOH, GC_is_visible calls GC_register_dynamic_libraries, which runs
the above code, so this could be a problem.

What do you think?

Thanks,
Ludo’.



More information about the Gc mailing list