This is analogous to the corresponding theorem for Java, though some of the proof details differ.
Main Claim: If a program allows no data races on a given input, and the only synchronization operations are
Assumptions about Synchronization Operations:
We assume that there exists a single strict (irreflexive) total "synchronization order" S on operations on both locks and atomic objects, such that:
Note that if we had just locks, it would suffice to insist on a total order of the operations on each lock. If these orders were inconsistent, i.e. their union together with the sequenced-before relation resulted in a cycle, we would get a cycle in the happens-before relation, disallowing that particular execution.
However this is not true for the combination of atomics and locks. Consider the following example (a variation on the standard Dekker's algorithm example), where l is a lock, x is a normal variable and initially zero, y is atomic and initially zero:
| Thread1 | Thread2 |
|
a: lock l; x = 1; b: unlock l; c: r1 = y; |
d: y = 1; e: lock l; r2 = x; f: unlock l; |
Perhaps the right way to reflect this in the specification is to state that operations on each lock must occur in some total order (with the right alternation constraint), and all operations on sequentially consistent atomics must occur in some total order that is also consistent with the happens-before ordering imposed by the lock operations.
Note that the synchronization order is a superset of the "synchronizes-with" relation.
Preliminaries:
Let the hb+ relation be the transitive closure of the union of the is-sequenced-before relations and the synchronization order. Note that since happens-before is the transitive closure of the union of inter-thread-ordered-before and synchronizes-with, which are respectively subsets of the two relations used in the definition of hb+, hb+ is a superset of happens-before.
By our constraints on the synchronization order, hb+ is irreflexive.
Proof Of Main Claim:
Again consider a particular execution on the given input, which is "consistent" according to the rules of N2171.
It follows that the corresponding hb+ is an irreflexive partial order, which we can extend to a strict total order T.
Clearly the actions of each thread appear in T in thread order, i.e. in is-sequenced-before order.
It remains to be shown that each load sees the last preceding store in T that stores to the same location. (For present purposes we view bit-field stores as loading and then storing into the entire "location", i.e. contiguous sequence of non-zero-length bit-fields.)
Consider a store operation Svisible seen by a load L.
If L and Svisible are performed by the same thread, then Svisible must be sequenced before L to be consistent with the intra-thread sequencing rules. Hence Svisible hb+ L.
Next consider the case in which they are performed by different threads.
If L happens before Svisible, we have both L precedes Svisible, and Svisible precedes L, and therefore L precedes L, which is impossible.
If neither of Svisible and L happen before each other, there is a data race, again violating our assumption.
Thus Svisible must happen before L, and thus in all cases Svisible hb+ L, and they are correspondingly ordered in T.
Now assume that another store Smiddle appears between Svisible and L in T.
We know from the fact that T is an extension of hb+, that we cannot have either of
L hb+ Smiddle
Smiddle hb+ Svisible
since that would be inconsistent with the reverse ordering in T.
However all three operations conflict and we have no data races. Hence they must all be ordered by happens-before and thus hb+. But this violates the second clause of 1.10p10, the definition of a consistent execution, concluding the proof.
I believe this also holds if a data race is defined as a store occurring simultaneously with (or adjacent to in a sequential interleaving) a load or a store, so long as we restrict synchronization operations to simple locks without a trylock primitive. See Boehm, "Reordering Constraints for Pthread-Style Locks" for details.
None of the above applies if we allow load_acquire and store_release operations, since the synchronization operations themselves may not behave in a sequentially consistent manner. In particular, consider the following example from the strawman proposal:
| Thread1 | Thread2 |
| store_release(&y, 1); r1 = load_acquire(&x); |
store_release(&x, 1); r2 = load_acquire(&y); |
This allows r1 = r2 = 0, where sequential consistency (and Java) do not.