A Memory Model for C++: Sequential Consistency for Race-Free Programs

Here we argue that programs which are inter-thread-data-race-free by the definitions in N2171, and use only locks for synchronization, exhibit only sequentially consistent executions.

This is analogous to the corresponding theorem for Java, though some of the proof details differ.

Main Claim: If a program allows no data races on a given input, and the only synchronization operations are

  1. Sequentially consistent atomics, and
  2. Simple lock and unlock operations with acquire and release semantics respectively,
then the program obeys sequential consistency rules, i.e. it behaves as though it had been executed by sequentially interleaving its actions, and letting each load instruction see the last preceding value stored to the same location in this interleaving.

Assumptions about Synchronization Operations:

We assume that there exists a single strict (irreflexive) total "synchronization order" S on operations on both locks and atomic objects, such that:

  1. S is consistent with the is-sequenced-before orders of evaluations performed by each thread, i.e. the transitive closure of the union of is-sequenced-before relations and the synchronization order remains irreflexive.
  2. S is consistent with the modification order of each atomic variable, i.e. the modification orders are just S restricted to operations on that variable.
  3. Lock acquisitions and releases for a given lock alternate in S.
  4. For every lock acquisition in S the immediately following release of that lock is performed by the same thread.
  5. Each load operation performed on an atomic object yields the value of the immediately preceding (in S) store to that atomic object.

Note that if we had just locks, it would suffice to insist on a total order of the operations on each lock. If these orders were inconsistent, i.e. their union together with the sequenced-before relation resulted in a cycle, we would get a cycle in the happens-before relation, disallowing that particular execution.

However this is not true for the combination of atomics and locks. Consider the following example (a variation on the standard Dekker's algorithm example), where l is a lock, x is a normal variable and initially zero, y is atomic and initially zero:
Thread1 Thread2
a: lock l;
   x = 1;
b: unlock l;

c: r1 = y;
d: y = 1;

e: lock l;
   r2 = x;
f: unlock l;
It is possible to arrange the lock operations in a total order e < f < a < b and the atomic operations in a total order c < d that together allow an execution in which r1 = r2 = 0 without introducing a happens-before cycle. (Note that we do not have c hb d, as we would if y were protected by a lock like x.) However such an execution is clearly not sequentially consistent.

Perhaps the right way to reflect this in the specification is to state that operations on each lock must occur in some total order (with the right alternation constraint), and all operations on sequentially consistent atomics must occur in some total order that is also consistent with the happens-before ordering imposed by the lock operations.

Note that the synchronization order is a superset of the "synchronizes-with" relation.

Preliminaries:

Let the hb+ relation be the transitive closure of the union of the is-sequenced-before relations and the synchronization order. Note that since happens-before is the transitive closure of the union of inter-thread-ordered-before and synchronizes-with, which are respectively subsets of the two relations used in the definition of hb+, hb+ is a superset of happens-before.

By our constraints on the synchronization order, hb+ is irreflexive.

Proof Of Main Claim:

Again consider a particular execution on the given input, which is "consistent" according to the rules of N2171.

It follows that the corresponding hb+ is an irreflexive partial order, which we can extend to a strict total order T.

Clearly the actions of each thread appear in T in thread order, i.e. in is-sequenced-before order.

It remains to be shown that each load sees the last preceding store in T that stores to the same location. (For present purposes we view bit-field stores as loading and then storing into the entire "location", i.e. contiguous sequence of non-zero-length bit-fields.)

Consider a store operation Svisible seen by a load L.

If L and Svisible are performed by the same thread, then Svisible must be sequenced before L to be consistent with the intra-thread sequencing rules. Hence Svisible hb+ L.

Next consider the case in which they are performed by different threads.

If L happens before Svisible, we have both L precedes Svisible, and Svisible precedes L, and therefore L precedes L, which is impossible.

If neither of Svisible and L happen before each other, there is a data race, again violating our assumption.

Thus Svisible must happen before L, and thus in all cases Svisible hb+ L, and they are correspondingly ordered in T.

Now assume that another store Smiddle appears between Svisible and L in T.

We know from the fact that T is an extension of hb+, that we cannot have either of

L hb+ Smiddle

Smiddle hb+ Svisible

since that would be inconsistent with the reverse ordering in T.

However all three operations conflict and we have no data races. Hence they must all be ordered by happens-before and thus hb+. But this violates the second clause of 1.10p10, the definition of a consistent execution, concluding the proof.

I believe this also holds if a data race is defined as a store occurring simultaneously with (or adjacent to in a sequential interleaving) a load or a store, so long as we restrict synchronization operations to simple locks without a trylock primitive. See Boehm, "Reordering Constraints for Pthread-Style Locks" for details.

None of the above applies if we allow load_acquire and store_release operations, since the synchronization operations themselves may not behave in a sequentially consistent manner. In particular, consider the following example from the strawman proposal:

Thread1 Thread2
store_release(&y, 1);
r1 = load_acquire(&x);
store_release(&x, 1);
r2 = load_acquire(&y);

This allows r1 = r2 = 0, where sequential consistency (and Java) do not.