[Gc] Re: Understanding the performance of a `libgc'-based application

Ludovic Courtès ludovic.courtes at laas.fr
Tue Nov 28 04:26:34 PST 2006


Hi,

(Re-sending without the large attachment...)

"Boehm, Hans" <hans.boehm at hp.com> writes:

> Can you generate an execution profile of the libgc-based Guile,
> perhaps running GCBench?

Sure.  The full `gprof' output of the libgc-based Guile running GCBench
is available at [0].  `libgc' is 6.8.  For some reason, the number of
calls of libgc functions is not displayed.

For comparison, here is the "top 10" flat profile of "normal Guile"
running `GCBench' as well:

  Each sample counts as 0.01 seconds.
    %   cumulative   self              self     total           
   time   seconds   seconds    calls  Ks/call  Ks/call  name    
   25.69    529.30   529.30 139465033     0.00     0.00  scm_gc_mark_dependencies
   23.79   1019.51   490.21     9970     0.00     0.00  deval
   23.07   1494.79   475.28  3074617     0.00     0.00  scm_i_sweep_card
    6.73   1633.40   138.61 264869792     0.00     0.00  scm_gc_mark
    3.12   1697.78    64.38 143001687     0.00     0.00  scm_ilookup
    2.95   1758.65    60.87 85265684     0.00     0.00  scm_list_2
    2.42   1808.54    49.89 117193958     0.00     0.00  scm_list_1
    2.05   1850.68    42.14 57511095     0.00     0.00  scm_acons
    0.95   1870.33    19.65 15685771     0.00     0.00  scm_less_p
    0.91   1889.10    18.77     1278     0.00     0.00  scm_i_mark_weak_vectors_non_weaks

> If you run GCBench with the standard Guile, presumably there are
> essentially no malloc calls involved, since essentially all the
> objects are small?

Right, only the GC allocation routines are used.

> Are you using a pre-built GC package, or did you build it?  If you are
> using a standard thread-safe build, without thread-local allocation,
> the GC allocator will end up spending a lot of its time acquiring and
> releasing the allocation lock, especially on a Pentium4-based machine.
> If you are comparing it to a purely single-threaded Guile collector,
> that could easily account for the difference.  (GC7 may largely fix
> that, if that's indeed what you're seeing.)

I'm usually using Debian's `libgc1c2' package.  However, this time, I
compiled my own (with `-pg'), without specifying any `configure' flag.

Note that Guile supports preemptive multithreading since 1.8.  From a GC
viewpoint, each thread has its own "freelist" from which it can sweep
cells, but other state is shared among threads and thus requires
synchronization.  In any case, isn't `pthread_mutex_lock ()' essentially
a single "test and set" instruction on most platforms (since Linux has
futexes)?

Thanks,
Ludovic.

[0] http://www.laas.fr/~lcourtes/tmp/libgc-guile-prof.txt.gz



More information about the Gc mailing list