foundations of computational agents
There are many variants of iterative best improvement with randomness.
If the variables have small finite domains, a local search algorithm can consider all other values of the variable when considering the possible successors. If the domains are large, the cost of considering all other values may be too high. An alternative is to consider only a few other values, often the close values, for one of the variables. Sometimes quite sophisticated methods are used to select an alternative value.
As presented, the local search has no memory. It does not remember anything about the search as it proceeds. A simple way to use memory to improve a local search is use tabu search that prevents recently changed variable assignments from being changed again. The idea is, when selecting a variable to change, not to select a variable that was changed in the last $t$ steps for some integer $t$, called the tabu tenure. If $t$ is small, tabu search can be implemented by having a list of the recently changed variables. If $t$ is larger it can be implemented by including, for each variable, the step at which the variable got its current value. Tabu search prevents cycling among a few assignments. The tabu tenure is one of the parameters that can be optimized. A tabu list of size 1 is equivalent to not allowing the same assignment to be immediately revisited.
Algorithms differ in how much work they require to guarantee the best improvement step. At one extreme, an algorithm can guarantee to select one of the new assignments with the best improvement out of all of the possible successors. At the other extreme, an algorithm can select a new assignment at random and reject the assignment if it makes the situation worse. It is often better to make a quick choice than to spend a lot of time making the best choice. Which of these methods works best is, typically, an empirical question; it is difficult to determine theoretically whether slow large steps are better than small quick steps for a particular problem domain. There are many possible variants of which successor to select, some of which are explored in the next sections.
The most improving step method always selects a variable–value pair that makes the best improvement. If there are many such pairs, one is chosen at random.
The naive way of implementing most improving step is, given a current total assignment, to linearly scan the variables, and for each variable $X$ and for each value $v$ in the domain of $X$ that is different from the value $X$ has in the current total assignment, comparing the current total assignment with the assignment that differed by just having $X=v$. Then select one of the variable–value pairs that results in the best improvement, even if that improvement is negative. Variables that do not appear in any constraints can be ignored. One step requires $O(ndr)$ evaluations of constraints, where $n$ is the number of variables, $d$ is the domain size, and $r$ is the number of constraints for each variable.
A more sophisticated alternative is to have a priority queue of variable–value pairs with associated weights. For each variable $X$, and each value $v$ in the domain of $X$ such that $X$ is not assigned to $v$ in the current assignment, the pair $$ would be in the priority queue. The weight $w$ of the pair $$ is the evaluation of the total assignment that is the same as the current total assignment, but with $X=v$ minus the evaluation of the current total assignment. This weight depends on values assigned to $X$ and the neighbors of $X$ in the constraint graph, but does not depend on the values assigned to other variables. At each stage, the algorithm selects a variable–value pair with minimum weight, which gives a successor with the biggest improvement.
Once a variable $X$ has been given a new value, the weights of all variable–value pairs that participate in a constraint that has been made satisfied or made unsatisfied by the new assignment to $X$ must have their weights reassessed and, if changed, they need to be reinserted into the priority queue.
The size of the priority queue is $n(d-1)$, where $n$ is the number of variables and $d$ is the average domain size. To insert or remove an element takes time $O(\mathrm{log}nd)$. The algorithm removes one element from the priority queue, adds another and updates the weights of at most $rk$ variables, where $r$ is the number of contraints per variable and $k$ is the number of variables per constraint. The complexity of one step of this algorithm is $O(rkd\mathrm{log}nd)$, where $n$ is the number of variables, $d$ is the average domain size, and $r$ is the number of constraints per variable.
This algorithm spends much time maintaining the data structures to ensure that the most improving step is taken at each time.
An alternative is to first select a variable and then select a value for that variable. The two-stage choice algorithm maintains a priority queue of variables, where the weight of a variable is the number of conflicts in which it participates. At each time, the algorithm selects a variable that participates in the maximum number of conflicts. Once a variable has been chosen, it can be assigned either a value that minimizes the number of conflicts or a value at random. For each constraint that becomes true or false as a result of this new assignment, the other variables participating in the constraint must have their weight reevaluated.
The complexity of one step of this algorithm is $O(rk\mathrm{log}n)$, where $n$ is the number of variables and $r$ is the number of constraints per variable, and $k$ is the number of variables per constraint. Compared to selecting the best variable–value pair, this does less work for each step and so more steps are achievable for any given time period. However, the steps tend to give less improvement, giving a trade-off between the number of steps and the complexity per step.
Instead of choosing the best variable, an even simpler alternative is to select any variable participating in a conflict. A variable that is involved in a conflict is a conflicting variable. In the any-conflict algorithm, at each step, one of the conflicting variables is selected at random. The algorithm assigns to the chosen variable one of the values that minimizes the number of violated constraints or a value at random.
There are two variants of this algorithm, which differ in how the variable to be modified is selected:
In the first variant, a conflict is chosen at random, and then a variable that is involved in the conflict is chosen at random.
In the second variant, a variable that is involved in a conflict is chosen at random.
These differ in the probability that a variable in a conflict is chosen. In the first variant, the probability variable is chosen depends on the number of conflicts it is involved in. In the second variant, all of the variables that are in conflicts are equally likely to be chosen.
Each of these algorithms requires maintaining data structures that enable a variable to be quickly selected at random. The data structure need to be maintained as variables change their values. The first variant requires a set of conflicts from which a random element is selected, such as a binary search tree. The complexity of one step of this algorithm is thus $O(r\mathrm{log}c)$, where $r$ is the number of constraints per variable and $c$ is the number of constraints, because in the worst case $r$ constraints may need to be added or removed from the set of conflicts.
The last method maintains no data structure of conflicts; instead it picks a variable and a new value for that variable at random and either rejects or accepts the new assignment.
Annealing is a metallurgical process where molten metals are slowly cooled to allow them to reach a low energy state, making them stronger. Simulated annealing is an analogous method for optimization. It is typically described in terms of thermodynamics. The random movement corresponds to high temperature; at low temperature, there is little randomness. Simulated annealing is a stochastic local search algorithm where the temperature is reduced slowly, starting from a random walk at high temperature eventually becoming pure greedy descent as it approaches zero temperature. The randomness should allow the search to jump out of local minima and find regions that have a low heuristic value; whereas the greedy descent will lead to local minima. At high temperatures, worsening steps are more likely than at lower temperatures.
Like the other methods, simulated annealing maintains a current total assignment. At each step, it picks a variable at random, then picks a value for that variable at random. If assigning that value to the variable does not increase the number of conflicts, the algorithm accepts the assignment of that value to the variable, resulting in a new current assignment. Otherwise, it accepts the assignment with some probability, depending on the temperature and how much worse the new assignment is than the current assignment. If the change is not accepted, the current assignment is unchanged.
To control whether worsening steps are accepted, there is a positive real-valued temperature $T$. Suppose $A$ is the current total assignment. Suppose that $h(A)$ is the evaluation of assignment $A$ to be minimized. For solving constraints, $h$ is typically the number of conflicts. Simulated annealing selects a possible successor at random, which gives a new assignment ${A}^{\prime}$. If $h({A}^{\prime})\le h(A)$, it accepts the assignment and ${A}^{\prime}$ becomes the new assignment. Otherwise, the new assignment is accepted randomly, using a Gibbs distribution or Boltzmann distribution, with probability
$${e}^{-(h({A}^{\prime})-h(A))/T}.$$ |
This is only used when $h({A}^{\prime})-h(A)>0$, and so the exponent is always negative. If $h({A}^{\prime})$ is close to $h(A)$, the assignment is more likely to be accepted. If the temperature is high, the exponent will be close to zero, and so the probability will be close to 1. As the temperature approaches zero, the exponent approaches $-\mathrm{\infty}$, and the probability approaches zero.
Temperature | Probability of acceptance | |||
---|---|---|---|---|
1-worse | 2-worse | 3-worse | ||
10 | 0.9 | 0.82 | 0.74 | |
1 | 0.37 | 0.14 | 0.05 | |
0 | .25 | 0.018 | 0.0003 | 0.000006 |
0 | .1 | 0.00005 | $2*{10}^{-9}$ | $9*{10}^{-14}$ |
Figure 4.9 shows the probability of accepting worsening steps at different temperatures. In this figure, $k$-worse means that $h({A}^{\prime})-h(A)=k$. For example, if the temperature is 10 (i.e., $T=10$), a change that is one worse (i.e., if $h({A}^{\prime})-h(A)=1$) will be accepted with probability ${e}^{-0.1}\approx 0.9$; a change that is two worse will be accepted with probability ${e}^{-0.2}\approx 0.82$. If the temperature $T$ is 1, accepting a change that is one worse will happen with probability ${e}^{-1}\approx 0.37$. If the temperature is 0.1, a change that is one worse will be accepted with probability ${e}^{-10}\approx 0.00005$. At this temperature, it is essentially only performing steps that improve the value or leave it unchanged.
If the temperature is high, as in the $T=10$ case, the algorithm tends to accept steps that only worsen a small amount; it does not tend to accept very large worsening steps. There is a slight preference for improving steps. As the temperature is reduced (e.g., when $T=1$), worsening steps, although still possible, become much less likely. When the temperature is low (e.g., $T=0.1$), it is very rare that it selects a worsening step.
Simulated annealing requires an annealing schedule, which specifies how the temperature is reduced as the search progresses. Geometric cooling is one of the most widely used schedules. An example of a geometric cooling schedule is to start with a temperature of 10 and multiply by 0.99 after each step; this will have a temperature of 0.07 after 500 steps. Finding a good annealing schedule is an art, and depends on the problem.