To help make this discussion concrete, we consider simple examples of phase transitions in search. Perhaps the simplest consists of the size of the search tree arising during a depth-first backtrack search as a function of how well the local search heuristic is able to prune unproductive search choices . Most of the cost in such searches is often due to attempts to recover from some early incorrect choices that preclude any possibility of a solution. In these situations, the time spent backtracking among unproductive additional choices before returning to the early choices made in the search is determined by how well the search heuristic is able to prune unproductive choices from consideration.
This problem can be formalized by supposing that the problem consists of a series of d decisions to be made and, at each decision point in the search, there are b choices to consider. This gives a search tree of depth d and branching ratio b. For simplicity, we consider the case where prior search choices eliminate all possible solutions to the problem, or the problem itself has no solutions. In addition, we suppose there is some way to identify unproductive sets of choices, perhaps because they violate some constraints or through some domain knowledge incorporated into a search heuristic. A simple model is to assume that this identification eliminates each unproductive choice independently and with probability . At the extremes, corresponds to perfect knowledge and to a completely ineffective heuristic. This gives an effective average branching ratio for the search of . This specification of a heuristic search problem amounts to defining a simple ensemble of random problems, whose properties can be easily evaluated exactly.
Completing the analysis of this problem requires identifying some global property of interest and then relating it to the local problem parameters (d, b and p in this case). An important global property is the search cost given by the number of nodes visited during the search before concluding there is no solution. For a particular search instance, the cost to search at and below a given node n in the search tree is given by
where is the k child of node n and if this child is not pruned and 0 otherwise. A node at depth d has no children and hence a cost of 1. The overall cost is just the value starting from the root of the search tree.
The value of the search cost will vary among the instances in the ensemble. One way to characterize the typical behavior is through the average value of the search cost. In this model, the behavior depends only on the depth of the node in the tree. Combined with the independent pruning choices in this model, we have where is the expected cost of a node at depth j. The expected cost of the entire search, starting from the root, is then
This simple analytic function relates the local heuristic effectiveness (through the value of z) to the global cost, on average. However, in the limit of infinitely large problems (i.e., ) we find very different asymptotic behaviors:
indicating an abrupt change in character at the transition point .
Because the individual search instances have different costs, another important aspect of this analysis is to determine how representative the average value is. This can be done by comparing the variance in costs to the average. Starting with
we again get dependence only on depth in the tree so that
and hence (because )
with . Thus for the variance we have
Now so we obtain
Asymptotically, the relative variance is
The relative deviation is just the square root of this quantity. This exhibits a peak at the transition point, as illustrated in Fig. 1. Large fluctuations at the transition point are a typical characteristic of phase transitions. As a consequence, attempts to evaluate behaviors near transitions with simulation experiments often require many more samples to accurately sample the typical range of behaviors than for parameter values far from transition points. This figure also illustrates how the singular nature of the transition in the limit of infinitely large problems arises out of the relatively smooth behavior for small sizes.
Figure 1: Relative deviation r in the search cost as a function of the effective pruning z for b=3. The curves show the behavior for problems of size 20 (gray), 50 (dashed) and 100 (solid), and are indistinguishable except near the transition. Note the increasingly large fluctuations near the transition point at z=1.
In terms of constraint satisfaction problems, z;SPMlt;1 corresponds to very highly overconstrained problems so that incorrect choices are pruned almost immediately. This simple example of a phase transition in a search problem can be extended to more complex situations where the number of solutions varies or the heuristic pruning effectiveness varies with depth . As long as the various search choices are assumed to occur independently, exact results are relatively straightforward to obtain.