[Gc] Practical implications of MPROTECT_VDB in gc-6.6 and 7.2?

Boehm, Hans hans.boehm at hp.com
Tue Sep 1 13:20:55 PDT 2009


> From: Loren James Rittle
> 
> Since we originally enabled threading on FreeBSD in gc, 
> threading support on the platform has radically improved.  We 
> currently have this configuration (in both the gcc copy of 
> 6.6 and the 7.2alpha2):
> 
> #      ifndef GC_FREEBSD_THREADS
> #          define MPROTECT_VDB
> #      endif
> 
> However, I have noticed that MPROTECT_VDB now works fine on 
> FreeBSD[7] with threads enabled for both versions of gc that 
> I checked.  However, the test code runs slower on the one 
> specific hardware platform I checked.  I will not give exact 
> details but it appeared to be about 50% overhead.
> 
> I noticed that one platform does similar to FreeBSD:
> 
> #       if !defined(GC_LINUX_THREADS) || !defined(REDIRECT_MALLOC)
> #           define MPROTECT_VDB
> 
> I also noticed that at least one platform does the exact opposite:
> 
> #     ifdef GC_DARWIN_THREADS
> #       define MPROTECT_VDB
> 
> The questions are:
> 
> Is the performance of the test code really indicative of 
> application performance? (I would think not.  And it seems 
> wise to use incremental mode for other reasons if possible 
> even if it had some extra overhead.)
> 
> Should the MPROTECT_VDB feature be enabled for both threading 
> and non-threading configurations, if it actually works in 
> both cases for the platform?
> 
I would enable it if it works.  Incremental collection is never enabled by default anyway; you still have to call GC_enable_incremental() for MPROTECT_DB to have a real effect.  The test program does so.  Other applications hopefully won't unless they really care about pause times.

The test code is probably not a good performance benchmark.  Especially if you use thread-local allocation, I think you get some anomalies.  The collector will try to use a pthread mutex for the main allocator lock on the assumption that lock acquisitions are rare.  But pieces of the test code cycle through fairly large object sizes that bypass the thread-local allocation machinery.  Thus you end up with a lot of sensitivity to the performance of pthread_mutex_(un)lock (in addition to unavoidable contention on the mutex).

Hans


More information about the Gc mailing list