Reх[2]: [Gc] Race condition in garbage collector

Ivan Maidanski ivmai at mail.ru
Sun Aug 5 01:18:33 PDT 2012


Hi Juan,

It looks like to be hard to avoid locking in GC_dyld_image_add/remove, so let's try to propose some other workaround for your exact case. (As you say wrapping dlopen/close is not working due to non-recursive locks neither.)
My idea is to unlock for a while somewhere in GC_inner_start_routine to prevent deadlocking. But where? Since you say it happens on thread exit, i.e. in GC_thread_exit_proc (acquiring the lock) called from pthread cleanup.

As I understand, DISABLE/RESTORE_CANCEL and GC_remove_specific are both no-op on your target, right?
Do you have GC_incremental=0? (In this case GC_wait_for_gc_completion does almost nothing.)
GC_unregister_my_thread_inner seems to be inlined as it's not shown in backtrace.
The thread (1) has DETACHED bit set, right?
If everything is as I assume then there are only 3 calls of interest:
- pthread_self
- mach_port_deallocate (and mach_task_self),
- GC_INTERNAL_FREE.

Could you please temporarily comment out these mach_port_deallocate (and mach_task_self), GC_INTERNAL_FREE calls and retry?

Regards,
Ivan

Sun, 22 Jul 2012 21:20:48 +0200 Juan Jose Garcia-Ripoll <juanjose.garciaripoll at gmail.com>:
On Sun, Jul 22, 2012 at 4:43 PM, Juan Jose Garcia-Ripoll <juanjose.garciaripoll at gmail.com> wrote:
 
1) This thread is a servicing one. It is trying to exit and in the process it acquires the GC lock, but for some reason the thread invokes the dyld library. I still haven't located where in GC this happens but from the symptoms it seems it is close to GC_unregister...[...]
2) This thread is the main one. It is trying to close a bunch of libraries, none of which are related to the thread above. However, when dlclose() is called, some code associated to the garbage collector is run and we enter a race condition.
It is very difficult to prevent 1) from happening, because the call to dyld happens inside the garbage collector exit code, or somewhere in pthread's library, I do not know.

I have tried wrapping dlopen() and dlclose() with GC_call_with_alloc_lock(). The problem here is that the garbage collector uses default mutexes and they are not recursive in OS X. The result is a deadlock.

I would appreciate some solution.

(gdb) thread 2
(gdb) bt
#0  0x00007fff88009bf2 in __psynch_mutexwait ()
#1  0x00007fff897d31a1 in pthread_mutex_lock ()
#2  0x00007fff84eae623 in dyldGlobalLockAcquire ()
#3  0x00007fff6172a745 in __dyld__ZN26ImageLoaderMachOCompressed20doBindFastLazySymbolEjRKN11ImageLoader11Link\
ContextEPFvvES5_ ()
#4  0x00007fff61717922 in __dyld__ZN4dyld18fastBindLazySymbolEPP11ImageLoaderm ()
#5  0x00007fff84eae716 in dyld_stub_binder_ ()
#6  0x0000000101d01458 in C.88.15036 ()
#7  0x0000000101c73100 in GC_inner_start_routine (sb=0x1041deeb0, arg=0x102117ea0) at pthread_start.c:67
#8  0x0000000101c6eb1c in GC_call_with_stack_base (fn=0x101c73030 <GC_inner_start_routine>, arg=0x102117ea0) a\
t misc.c:1510
#9  0x0000000101c74565 in GC_start_routine (arg=0x102117ea0) at pthread_support.c:1504
#10 0x00007fff897d48bf in _pthread_start ()
#11 0x00007fff897d7b75 in thread_start ()
(gdb) thread 1
[Switching to thread 1 (process 37491), "com.apple.main-thread"]
0x00007fff88009bf2 in __psynch_mutexwait ()
(gdb) bt
#0  0x00007fff88009bf2 in __psynch_mutexwait ()
#1  0x00007fff897d31a1 in pthread_mutex_lock ()
#2  0x0000000101c74833 in GC_lock () at pthread_support.c:1784
#3  0x0000000101c6c53d in GC_remove_roots (b=0x104f03220, e=0x104f03238) at mark_rts.c:311
#4  0x0000000101c61f20 in GC_dyld_image_remove (hdr=0x104eff000, slide=4377800704) at dyn_load.c:1319
#5  0x00007fff61714bdd in __dyld__ZN4dyld11removeImageEP11ImageLoader ()
#6  0x00007fff6171858d in __dyld__ZN4dyld20garbageCollectImagesEv ()
#7  0x00007fff6171c432 in __dyld_dlclose ()
#8  0x00007fff84eaebd5 in dlclose ()
#9  0x0000000101c2ae8c in dlclose_wrapper [inlined] () at /Users/jjgarcia/devel/ecl/src/c/ffi/libraries.d:432
#10 0x0000000101c2ae8c in ecl_library_close (block=0x103be4e00) at libraries.d:432
#11 0x0000000101c2af79 in ecl_library_close_all () at libraries.d:448
#12 0x0000000101b1a84d in cl_shutdown () at main.d:301
#13 0x0000000101b1a964 in si_exit (narg=4377800704) at main.d:839
#14 0x0000000101b13e47 in main ()

Juanjo

--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://napali.hpl.hp.com/pipermail/gc/attachments/20120805/ddf92197/attachment.htm


More information about the Gc mailing list