Fwd: [Gc] Performance of bdwgc7.2 had degraded compared to 6.8 - the patch to test

Manuel.Serrano at inria.fr Manuel.Serrano at inria.fr
Fri Dec 10 23:37:11 PST 2010


Hi Ivan,

> Please confirm that you don't compile GC (for this benchmark) with multi-threading support and don't use GC_DEBUG (and GC_debug_ routines).
> 
> If yes, then the only difference between gc71+test1_patch and gc72a2+test2_patch+test3_patch is in GC_clear-a_few_frames() (in alloc.c). Please benchmark gc72a2+test4_patch (which is attached).
No difference (and I confirm once more, no multi-threading, no debugging).

          7.2a4 7.2a2 7.1   7.0   7.0a7 6.8     7.1+ivan-30nov  7.2a2-test2  7.2a2-test3  7.2a2-test4
bague     0.76  0.77  0.77  0.76  0.77  0.77    0.77            0.77         0.77         0.77
beval     1.33  1.41  1.29  1.41  1.29  1.31    1.44            1.42         1.42         1.47
boyer     2.23  2.23  2.13  2.14  2.13  2.15    2.13            2.24         2.23         2.24
cgc       0.47  0.48  0.48  0.47  0.48  0.46    0.47            0.48         0.49         0.47
conform   1.91  1.91  1.74  1.72  1.73  1.79    1.71            1.92         1.92         1.88
earley    2.49  2.50  2.08  2.13  2.09  2.23    2.09            2.52         2.52         2.5
fib       0.01  0.01  0.01  0.01  0.01  0.01    0.01            0.01         0.01         0.01
fft       2.51  2.52  2.52  2.50  2.52  2.49    2.5             2.52         2.52         2.51
leval     1.12  1.13  1.05  1.01  1.02  1.09    1.02            1.14         1.13         1.12
maze      1.67  1.40  1.36  1.35  1.26  1.39    1.35            1.38         1.39         1.44
mbrot     7.03  7.05  7.04  7.03  7.05  7.05    7.02            7.06         7.07         7.05
nucleic   1.18  1.20  1.20  1.16  1.16  1.34    1.17            1.2          1.21         1.18
peval     1.46  1.47  1.20  1.19  1.20  1.18    1.2             1.47         1.49         1.46
puzzle    1.96  1.92  1.97  1.96  1.92  1.93    1.92            1.93         1.92         1.94
queens    2.29  2.29  1.55  1.56  1.55  1.44    1.56            2.36         2.36         2.31
qsort     1.65  1.64  1.63  1.62  1.63  1.63    1.63            1.65         1.65         1.65
rgc       1.28  1.28  1.23  1.23  1.24  1.28    1.23            1.29         1.28         1.29
sieve     1.58  1.60  1.44  1.42  1.41  1.51    1.43            1.59         1.59         1.6
traverse  5.14  5.15  3.55  3.60  3.56  3.58    3.59            5.13         5.15         5.15
almabench 1.45  1.45  1.45  1.45  1.45  1.46    1.45            1.46         1.46         1.45
SUM      39.52 39.41 35.69 35.72 35.47 36.09    35.69           39.54        39.58        39.49

I looks like queens is pretty critical. That benchmark allocates mostly
lists. So one might conjecture that there is a problem with allocating
this particular objects. Since lists are so widely used, Bigloo uses a
special allocation function that is implemented as follows (for the sake
of completeness I give here the implementation for 6.8 and also for
7.x).

-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
#define GC_INLINE_ALLOC_6xx( res, size, default_alloc ) \
   ptr_t op; \
   ptr_t *opp; \
   DCL_LOCK_STATE; \
   \
   opp = (void **)&(GC_objfreelist[ (long)ALIGNED_WORDS( size ) ]); \
   FASTLOCK(); \
   if( !FASTLOCK_SUCCEEDED() || (op = *opp) == 0 ) { \
      FASTUNLOCK(); \
      return default_alloc; \
   } \
   *opp = obj_link( op ); \
   GC_words_allocd += (long)ALIGNED_WORDS( size ); \
   FASTUNLOCK(); \
   \
   res = (obj_t)op;

#define GC_INLINE_ALLOC_7xx( res, size, default_alloc ) \
   void *op; \
   void **opp; \
   size_t lg; \
   DCL_LOCK_STATE; \
   \
   lg = GC_size_map[ size ]; \
   opp = (void **)&(GC_objfreelist[ lg ]); \
   LOCK(); \
   \
   if( EXPECT((op = *opp) == 0, 0) ) { \
      UNLOCK(); \
      return default_alloc; \
   }  \
   *opp = obj_link( op ); \
   GC_bytes_allocd += GRANULES_TO_BYTES( lg ); \
   UNLOCK(); \
   \
   res = (obj_t)op;

#if( BGL_GC_VERSION < 700 )
#  define GC_INLINE_ALLOC GC_INLINE_ALLOC_6xx
#else
#  define GC_INLINE_ALLOC GC_INLINE_ALLOC_7xx
#endif

GC_API obj_t 
make_pair( obj_t car, obj_t cdr ) {
   obj_t pair;
   
   GC_INLINE_ALLOC( pair, PAIR_SIZE, alloc_make_pair( car, cdr ) );

#if( !defined( TAG_PAIR ) )
   pair->pair_t.header = MAKE_HEADER( PAIR_TYPE, PAIR_SIZE );
#endif
   pair->pair_t.car = car;
   pair->pair_t.cdr = cdr;
   
   return BPAIR( pair );
}
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----

If you think that might help, tomorrow I can try to do a little bit a
profiling in order to understand where the performance difference for
queens comes from.

-- 
Manuel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
Url : http://napali.hpl.hp.com/pipermail/gc/attachments/20101211/70395fc5/attachment.pgp


More information about the Gc mailing list