[Gc] Race condition between thread termination and garbage collection under Solaris 10/x86

Burkhard Linke blinke at cebitec.uni-bielefeld.de
Thu Mar 4 08:21:14 PST 2010


Hi,

I'm halfway there, the attached patch fixes the deadlock. It introduces a new 
mutex that protects stopping the world/termination of threads. The test case 
does not run into a deadlock anymore, neither does any of the other test 
programs.

Unfortunatly gctest now produces a SIGSEGV under currently unclear conditions. 
Stacktrace:

core 'core' of 9680:    /vol/src/gnu/mono/contrib/bdwgc/.libs/gctest
-----------------  lwp# 1 / thread# 1  --------------------
 fffffd7fff17c9fa __sigsuspend () + a
 fffffd7fff28ad47 GC_suspend_handler_inner () + b3
 fffffd7fff28ac76 GC_suspend_handler () + 26
 fffffd7fff177386 __sighndlr () + 6
 fffffd7fff16bc32 call_user_handler () + 252
 fffffd7fff16be4e sigacthandler (2f, fffffd7fffdfeeb8, fffffd7fffdfeb50) + de
 --- called from signal handler with signal 47 (SIGRTMAX-1) ---
 fffffd7fff1772f7 __lwp_park () + 17
 fffffd7fff16fd08 mutex_lock_impl () + e8
 fffffd7fff16fdfb mutex_lock () + b
 fffffd7fff28a602 GC_generic_lock () + aa
 fffffd7fff28a646 GC_lock () + 2a
 fffffd7fff288fa7 GC_is_thread_tsd_valid () + 23
 fffffd7fff287aca GC_malloc () + fa
 00000000004046cf mktree () + 1b
 00000000004046f9 mktree () + 45
...

-----------------  lwp# 2 / thread# 2  --------------------
 fffffd7fff17c9fa __sigsuspend () + a
 fffffd7fff28ad47 GC_suspend_handler_inner () + b3
 fffffd7fff28ac76 GC_suspend_handler () + 26
 fffffd7fff177386 __sighndlr () + 6
 fffffd7fff16bc32 call_user_handler () + 252
 fffffd7fff16be4e sigacthandler (2f, fffffd7ffeefe848, fffffd7ffeefe4e0) + de
 --- called from signal handler with signal 47 (SIGRTMAX-1) ---
 fffffd7fff1772f7 __lwp_park () + 17
 fffffd7fff16fd08 mutex_lock_impl () + e8
 fffffd7fff16fdfb mutex_lock () + b
 fffffd7fff28a602 GC_generic_lock () + aa
 fffffd7fff28a646 GC_lock () + 2a
 fffffd7fff288fa7 GC_is_thread_tsd_valid () + 23
 fffffd7fff287aca GC_malloc () + fa
 00000000004046cf mktree () + 1b
 00000000004046f9 mktree () + 45
...

-----------------  lwp# 3 / thread# 3  --------------------
 fffffd7fff285966 GC_typed_mark_proc () + de
 fffffd7fff2789a2 GC_mark_from () + 18e
 fffffd7fff2785bc GC_mark_some () + 168
 fffffd7fff26c250 GC_stopped_mark () + a4
 fffffd7fff26bf1c GC_try_to_collect_inner () + 138
 fffffd7fff26cb4f GC_try_to_collect_general () + df
 fffffd7fff26cc5a GC_gcollect () + e
 0000000000404f17 typed_test () + 2d3
 0000000000405889 run_one_test () + 615
 0000000000405ced thr_run_one_test () + 9
 fffffd7fff28a0ba GC_inner_start_routine () + e6
 fffffd7fff27e5c4 GC_call_with_stack_base () + 1c
 fffffd7fff28a0fb GC_start_routine () + 13
 fffffd7fff17704b _thr_setup () + 5b
 fffffd7fff177280 _lwp_start ()
-----------------  lwp# 4 / thread# 4  --------------------
 fffffd7fff17c9fa __sigsuspend () + a
 fffffd7fff28ad47 GC_suspend_handler_inner () + b3
 fffffd7fff28ac76 GC_suspend_handler () + 26
 fffffd7fff177386 __sighndlr () + 6
 fffffd7fff16bc32 call_user_handler () + 252
 fffffd7fff16be4e sigacthandler (2f, fffffd7ffeb00848, fffffd7ffeb004e0) + de
 --- called from signal handler with signal 47 (SIGRTMAX-1) ---
 fffffd7fff1772f7 __lwp_park () + 17
 fffffd7fff16fd08 mutex_lock_impl () + e8
 fffffd7fff16fdfb mutex_lock () + b
 fffffd7fff28a602 GC_generic_lock () + aa
 fffffd7fff28a646 GC_lock () + 2a
 fffffd7fff288fa7 GC_is_thread_tsd_valid () + 23
 fffffd7fff287aca GC_malloc () + fa
 00000000004046cf mktree () + 1b
 0000000000404706 mktree () + 52
...

-----------------  lwp# 5 / thread# 5  --------------------
 fffffd7fff17c9fa __sigsuspend () + a
 fffffd7fff28ad47 GC_suspend_handler_inner () + b3
 fffffd7fff28ac76 GC_suspend_handler () + 26
 fffffd7fff177386 __sighndlr () + 6
 fffffd7fff16bc32 call_user_handler () + 252
 fffffd7fff16be4e sigacthandler (2f, fffffd7ffe901818, fffffd7ffe9014b0) + de
 --- called from signal handler with signal 47 (SIGRTMAX-1) ---
 fffffd7fff1772f7 __lwp_park () + 17
 fffffd7fff16fd08 mutex_lock_impl () + e8
 fffffd7fff16fdfb mutex_lock () + b
 fffffd7fff28a602 GC_generic_lock () + aa
 fffffd7fff28a646 GC_lock () + 2a
 fffffd7fff288fa7 GC_is_thread_tsd_valid () + 23
 fffffd7fff287aca GC_malloc () + fa
 00000000004046cf mktree () + 1b
 00000000004046f9 mktree () + 45
...

-----------------  lwp# 6 / thread# 6  --------------------
 fffffd7fff17c9fa __sigsuspend () + a
 fffffd7fff28ad47 GC_suspend_handler_inner () + b3
 fffffd7fff28ac76 GC_suspend_handler () + 26
 fffffd7fff177386 __sighndlr () + 6
 fffffd7fff16bc32 call_user_handler () + 252
 fffffd7fff16be4e sigacthandler (2f, fffffd7ffe702b28, fffffd7ffe7027c0) + de
 --- called from signal handler with signal 47 (SIGRTMAX-1) ---
 fffffd7fff1772f7 __lwp_park () + 17
 fffffd7fff16fd08 mutex_lock_impl () + e8
 fffffd7fff16fdfb mutex_lock () + b
 fffffd7fff28a602 GC_generic_lock () + aa
 fffffd7fff28a646 GC_lock () + 2a
 fffffd7fff28637b GC_malloc_explicitly_typed () + 87
 0000000000404d58 typed_test () + 114
 0000000000405889 run_one_test () + 615
 0000000000405ced thr_run_one_test () + 9
 fffffd7fff28a0ba GC_inner_start_routine () + e6
 fffffd7fff27e5c4 GC_call_with_stack_base () + 1c
 fffffd7fff28a0fb GC_start_routine () + 13
 fffffd7fff17704b _thr_setup () + 5b
 fffffd7fff177280 _lwp_start ()

Signaling seems to be ok, all threads except LWP 3 are suspended. I'm afraid 
I'll have to give up debugging at this point, the internals of the garbage 
collector itself are beyond my scope.

Burkhard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: solaris_thread_handler.diff
Type: text/x-diff
Size: 10873 bytes
Desc: not available
Url : http://napali.hpl.hp.com/pipermail/gc/attachments/20100304/02c15db1/solaris_thread_handler.bin


More information about the Gc mailing list