Re[4]: [Gc] RE: Abuse of collector...

Ivan Maidanski ivmai at mail.ru
Tue May 12 15:32:42 PDT 2009


Hi!

"Talbot, George" <Gtalbot at locuspharma.com> wrote:
> From: Ivan Maidanski [ivmai at mail.ru]
> Sent: Tuesday, May 12, 2009 2:48 AM
> 
> > I don't know how NUMA affects gc speed. Hans, may be, knows more...
> >
> > Your task is the only heavy-weight one running on the box at the same time, right?
> 
> Yes.  That's correct.
> 
> > > I've built the collector both with and without debugging from CVS (which I pulled on Friday).  Here's my current configure command line:
> > >
> > > CFLAGS="-O2" ./configure --enable-threads=posix --enable-thread-local-alloc --enable-parallel-mark --enable-cplusplus --enable-large-config --enable-munmap --enable-gc-debug
> >
> > You'd better show me what are the args passed to "gcc" (not configure).
> 
>  gcc -DPACKAGE_NAME=\"gc\" -DPACKAGE_TARNAME=\"gc\" -DPACKAGE_VERSION=\"7.2alpha1\" "-DPACKAGE_STRING=\"gc 7.2alpha1\"" -DPACKAGE_BUGREPORT=\"Hans.Boehm at hp.com\" -DGC_VERSION_MAJOR=7 -DGC_VERSION_MINOR=2 -DGC_ALPHA_VERSION=1 -DPACKAGE=\"gc\" -DVERSION=\"7.2alpha1\" -DGC_LINUX_THREADS=1 -D_REENTRANT=1 -DPARALLEL_MARK=1 -DTHREAD_LOCAL_ALLOC=1 -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DNO_EXECUTE_PERMISSION=1 -DALL_INTERIOR_POINTERS=1 -DGC_GCJ_SUPPORT=1 -DKEEP_BACK_PTRS=1 -DDBG_HDRS_ALL=1 -DMAKE_BACK_GRAPH=1 -DSAVE_CALL_COUNT=8 -DJAVA_FINALIZATION=1 -DATOMIC_UNCOLLECTABLE=1 -DLARGE_CONFIG=1 -DUSE_MMAP=1 -DUSE_MUNMAP=1 -DMUNMAP_THRESHOLD=6 -I./include -fexceptions -I libatomic_ops/src -O2 -MT reclaim.lo -MD -MP -MF .deps/reclaim.Tpo -c reclaim.c  -fPIC -DPIC -o .libs/reclaim.o

In the non-debug version, remove -DKEEP_BACK_PTRS -DDBG_HDRS_ALL -DMAKE_BACK_GRAPH and add -DNO_DEBUGGING -DNDEBUG.

MUNMAP_THRESHOLD value could be controlled by "GC_UNMAP_THRESHOLD" env var (0 value is used to disable unmapping).

> 
> > I'm also using -fno-strict-aliasing along with -O for safety (but I can't say whether the world is safer with it or not - gcc produces some warnings without it).
> 
> Seems like that's worthwhile doing.  I'll do that next.  O2 is the "correct" optimization level, right?  Is O3 OK?

Should be OK (but I haven't tried it). Also use -mtune=native.

> ...
> 
> > Try GC_ENABLE_INCREMENTAL with tests/test.c first - if it works for Your platform then the number
> > of collection (printed at end) should be smaller (approx by 1/4) in the incremental mode. Also try
> > to measure average pause time in different collector modes (eg., with/without PARALLEL_MARK).

I observe approx. 1/4 on Win32 (on Linux64 I saw only near 1/8). Better compare final heap sizes with/without GC_ENABLE_INCREMENTAL.

> 
> I'll give that a shot.
> 
> > > My program is heavily multithreaded (~90 threads are running during the initial startup).
> >
> > And all that threads operates garbage-collected memory (or the app is too complex to analyse that), right?
> 
> Yes.  I've entirely converted over to garbage-collected memory at this point.
> 
> > What's the total size of all threads' stacks?
> 
> 256K/thread + 2MB for the initial thread = 96 * 256k + 2MB = 26MB

This is upper bound. Not big compared to ptr-containing heap...

> 
> 128K/thread wasn't enough once the collector started running given the stack depths reached in my program during recursive traversal and modification of my data structure.
> 
> > If You think that you could miss an external event (or not respond to it in a reasonable time)
> > while the world is stopped then consider using stop_func (set the default one with GC_set_stop_func()).
> 
> I'll look into that.  Thanks.
> 
> >
> > > On the whole, I'm pretty satisfied with this, and am very happy that changes to my data
> > > structure and program enabled by the collector brought my overall address space and
> > > resident set sizes down by two thirds!
> > > ...
> > >Are there any other suggestions for bringing down the collection times?
> 
> > Try maximize the use of GC_malloc_atomic[_...]() instead of GC_malloc[_...]().
> 
> I'm doing that at this point.  Does the _ignore_off_page versions help also if possible to use them?

Better try to turn off all-interior-pointers mode. See also: Q2 in http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2524/focus=2532

> 
> > Call GC_no_dls(1) before GC_INIT() if it is possible for your app.
GC_set_no_dls(1)

> 
> Do I need to link the collector statically to do that?

Not.

> ...
> George T. Talbot
> <gtalbot at locuspharma.com>
> 

Bye.


More information about the Gc mailing list