[Gc] Fix for Win32 PARALLEL_MARK
hans.boehm at hp.com
Tue Nov 11 17:53:54 PST 2008
Thanks. This looks very promising, and it will be really nice to have this working reliably. I checked in the preceding PARALLEL_MARK patch, this patch, and diff25 for test.c. I Also changed the declaration of n_tests in test.c to have type AO_t (a necessary change, I think, though it may have been in one of your other patches).
I don't think this breaks anything, but it doesn't quite seem to be working for me either:
1) I think we need a definition of AO_ASSUME_WINDOWS98 if we're on Windows and PARALLEL_MARK is defined. Otherwise we get a correct complaint that CAS isn't available.
2) I'm seeing a failure on win64 with PARALLEL_MARK that I haven't debugged. Is that still expected?
Should configure.ac be updated to handle --enable-parallel-mark for cygwin?
> -----Original Message-----
> From: gc-bounces at napali.hpl.hp.com
> [mailto:gc-bounces at napali.hpl.hp.com] On Behalf Of Ivan Maidanski
> Sent: Sunday, October 26, 2008 8:17 AM
> To: gc at napali.hpl.hp.com
> Subject: [Gc] Fix for Win32 PARALLEL_MARK
> This patch is for my previous patch introducing parallel
> marking on Win32 and Cygwin. My previous implementation of
> Win32 GC_wait_marker() turns (frequently when GC_MARKERS >
> ncpu) into simple yielding (thus uselessly consuming CPU time
> outside garbage collections).
> Now I'm using PulseEvent and atomic NT SignalObjectAndWait()
> (on Windows 98/Me parallel marking is set disabled but it's
> ok since multicore CPUs are not supported by these OS, AFAIK).
> Mark lock spinning is disabled by default (unless
> MARKLOCK_USES_SPIN) but it works ok. More careful
> measurements should be done.
> Tested with MinGW and VC++ 32-bit on dual core CPU.
> Still some questions remain:
> - should we scan markers in GC_get_next_stack()?;
I think so. This is used to skip thread stacks. And we certainly don't want to mark the marker thread stacks.
> - should we use .._full() version (or weaker) of CAS in
I think you want to use _acquire versions whenver this is used to acquire a lock, and _release in GC_release_mark_lock. I did not study the logic there carefully enough to claim that it is correct. But I also didn't immediately see any problems. Mark_mutex_event is set up so that signalling it is remembered, even if there are currently no waiters, right?
One minor issue is that I would like to see corrected is that the statistics variables should also be accessed through AO_ operations. Otherwise it's very hard to tell whether they're trustworthy. You can easily get into situations in which such counters are off by a factor of two or more due to contention. I've been bitten by this. (The timing window is often much bigger than you think it is. You usually care about the counters in the event of contention, in which case counter accesses are likely to almost always miss the cache, and a thread is likely to wait for a while after the counter has been read, trying to convert the cache line to exclusive access. That also suggests that the additional overhead for the AO_ operations is probably not as big a deal as you thought it was. They might even help, since they presumably immediately acquire the cache line in exclusive mode, potentially avoiding a bus transaction. I haven't timed that.)
> PS. This patch also adjusts ..._printf() actual params
> (according to their format specifiers) for "win32_threads.c" file.
More information about the Gc