Artificial Intelligence - foundations of computational agents -- 9.1 Preferences and Utility

Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

9.1 Preferences and Utility

What an agent decides to do should depend on its preferences. In this section, we specify some intuitive properties of preferences that we want and give a consequence of these properties. The properties that we give are axioms of rationality from which we prove a theorem about how to measure these preferences. You should consider whether each axiom is reasonable for a rational agent to follow; if you accept them all as reasonable, you should accept their consequence. If you do not accept the consequence, you should question which of the axioms you are willing to give up.

An agent chooses actions based on their outcomes. Outcomes are whatever the agent has preferences over. If the agent does not have preferences over anything, it does not matter what the agent does. Initially, we consider outcomes without considering the associated actions. Assume there are only a finite number of outcomes.

We define a preference relation over outcomes. Suppose o₁ and o₂ are outcomes. We say that o₁ is weakly preferred to outcome o₂, written o₁ ≥o₂, if outcome o₁ is at least as desirable as outcome o₂. The axioms that follow are arguably reasonable properties of such a preference relation.

Define o₁ ∼o₂ to mean o₁ ≥o₂ and o₂ ≥o₁. That is, o₁ ∼o₂ means outcomes o₁ and o₂ are equally preferred. In this case, we say that the agent is indifferent between o₁ and o₂.

Define o₁ >o₂ to mean o₁ ≥o₂ and ¬(o₂ ≥o₁). That is, the agent prefers outcome o₁ to outcome o₂ and is not indifferent between them. In this case, we say that o₁ is strictly preferred to outcome o₂.

Typically, an agent does not know the outcome of its actions. A lottery is defined to be a finite distribution over outcomes, written as

[p₁:o₁, p₂:o₂, ..., p_k:o_k],

where o_i are outcomes and p_i are non-negative real numbers such that

∑_i p_i = 1.

The lottery specifies that outcome o_i occurs with probability p_i. In all that follows, assume that outcomes include lotteries. This includes the case of having lotteries over lotteries.

Axiom. [Completeness] An agent has preferences between all pairs of outcomes:
∀o₁ ∀o₂ o₁ ≥o₂ or o₂ ≥o₁.

The rationale for this axiom is that an agent must act; if the actions available to it have outcomes o₁ and o₂ then, by acting, it is explicitly or implicitly preferring one outcome over the other.

Axiom. [Transitivity] Preferences must be transitive:
if o₁ ≥o₂ and o₂ ≥o₃ then o₁ ≥o₃.

To see why this is reasonable, suppose it is false, in which case o₁ ≥o₂ and o₂ ≥o₃ and o₃ >o₁. Because o₃ is strictly preferred to o₁, the agent should be prepared to pay some amount to get from o₁ to o₃. Suppose the agent has outcome o₃; then o₂ is at least as good so the agent would just as soon have o₂. o₁ is at least as good as o₂ so the agent would just as soon have o₁ as o₂. Once the agent has o₁ it is again prepared to pay to get to o₃. It has gone through a cycle of preferences and paid money to end up where it is. This cycle that involves paying money to go through it is known as a money pump because, by going through the loop enough times, the amount of money that agent must pay can exceed any finite amount. It seems reasonable to claim that being prepared to pay money to cycle through a set of outcomes is irrational; hence, a rational agent should have transitive preferences.

Also assume that monotonicity holds for mixes of > and ≥, so that if one or both of the preferences in the premise of the transitivity axiom is strict, then the conclusion is strict. That is, if o₁ >o₂ and o₂ ≥o₃ then o₁ >o₃. Also, if o₁ ≥o₂ and o₂ >o₃ then o₁ >o₃.

Axiom. [Monotonicity] An agent prefers a larger chance of getting a better outcome than a smaller chance of getting the better outcome. That is, if o₁ >o₂ and p > q then
[p:o₁, (1-p):o₂] >[q:o₁, (1-q):o₂].

Note that, in this axiom, > between outcomes represents the agent's preference, whereas > between p and q represents the familiar comparison between numbers.

Axiom. [Decomposability] ("no fun in gambling"). An agent is indifferent between lotteries that have the same probabilities over the same outcomes, even if one or both is a lottery over lotteries. For example:
[p:o₁, (1-p):[q:o₂, (1-q):o₃]]

∼[p:o₁, (1-p)q: o₂, (1-p)(1-q):o₃].

Also o₁ ∼[1:o₁,0:o₂] for any outcomes o₁ and o₂.

This axiom specifies that it is only the outcomes and their probabilities that define a lottery. If an agent had a preference for gambling, that would be part of the outcome space.

These axioms can be used to characterize much of an agent's preferences between outcomes and lotteries. Suppose that o₁ >o₂ and o₂ >o₃. Consider whether the agent would prefer

o₂ or
the lottery [p:o₁,(1-p):o₃]

for different values of p ∈[0,1]. When p=1, the agent prefers the lottery (because the lottery is equivalent to o₁ and o₁ >o₂). When p=0, the agent prefers o₂ (because the lottery is equivalent to o₃ and o₂ >o₃). At some stage, as p is varied, the agent's preferences flip between preferring o₂ and preferring the lottery.

Figure 9.1: The preference between o₂ and the lottery, as a function of p.

Figure 9.1 shows how the preferences must flip as p is varied. On the X-axis is p and the Y-axis shows which of o₂ or the lottery is preferred.

Proposition. If an agent's preferences are complete, transitive, and follow the monotonicity and decomposability axioms, and if o₁ >o₂ and o₂ >o₃, there exists a number p₂ such that 0 ≤ p₂ ≤ 1 and
for all p<p₂, the agent prefers o₂ to the lottery (i.e., o₂ >[p:o₁,(1-p):o₃]) and
for all p>p₂, the agent prefers the lottery (i.e., [p:o₁,(1-p):o₃] >o₂).

Proof. By monotonicity and transitivity, if o₂ ≥[p:o₁,(1-p):o₃] for any p then, for all p'<p, o₂ >[p':o₁,(1-p'):o₃]. Similarly, if [p:o₁,(1-p):o₃]≥o₂ for any p then, for all p'>p, [p':o₁,(1-p'):o₃] >o₂. By completeness, for each value of p, either o₂ >[p:o₁,(1-p):o₃], o₂ ∼[p:o₁,(1-p):o₃] or [p:o₁,(1-p):o₃] >o₂. If there is some p such that o₂ ∼[p:o₁,(1-p):o₃], then the theorem holds. Otherwise a preference for either o₂ or the lottery with parameter p implies preferences for either all values greater than p or for all values less than p. By repeatedly subdividing the region that we do not know the preferences for, we will approach, in the limit, a value that fills the criteria for p₂.

The preceding proposition does not specify what the preference of the agent is at the point p₂. The following axiom specifies that the agent is indifferent at this point.

Axiom. [Continuity] Suppose o₁ >o₂ and o₂ >o₃, then there exists a p₂∈[0,1] such that
o₂ ∼[p₂:o₁,(1-p₂):o₃].

The next axiom specifies that, if you replace an outcome in a lottery with another outcome that is not worse, the lottery does not become worse.

Axiom. [Substitutability] If o₁ ≥o₂ then the agent weakly prefers lotteries that contain o₁ instead of o₂, everything else being equal. That is, for any number p and outcome o₃:
[p:o₁, (1-p):o₃] ≥[p:o₂, (1-p):o₃].

A direct corollary of this is that you can substitutes outcomes for which the agent is indifferent and not change preferences:

Proposition. If an agent obeys the substitutability axiom and o₁ ∼o₂ then the agent is indifferent between lotteries that only differ by o₁ and o₂. That is, for any number p and outcome o₃ the following indifference relation holds:
[p:o₁, (1-p):o₃] ∼[p:o₂, (1-p):o₃].

This follows because o₁ ∼o₂ is equivalent to o₁ ≥o₂ and o₂ ≥o₁.

An agent is defined to be rational if it obeys the completeness, transitivity, monotonicity, decomposability, continuity, and substitutability axioms.

It is up to you to determine if this technical definition of rational matches your intuitive notion of rational. In the rest of this section, we show consequences of this definition.

Although preferences may seem to be very complicated, the following theorem shows that a rational agent's value for an outcome can be measured by a real number and that these numbers can be combined with probabilities so that preferences under uncertainty can be compared using expectation. This is surprising because

it may seem that preferences are too multifaceted to be modeled by a single number. For example, although one may try to measure preferences in terms of dollars, not everything is for sale or easily converted into dollars and cents.
you would not expect that values could be combined with probabilities. An agent that is indifferent between $(px+(1-p)y) and the lottery [p:$x, (1-p)$y] for all monetary values x and y and for all p∈[0,1] is known as an expected monetary value (EMV) agent. Most people are not EMV agents, because they have, for example, a strict preference between $1,000,000 and the lottery [0.5:$0,0.5:$2,000,000]. (Think about whether you would prefer a million dollars or a coin toss where you would get nothing if the coin lands heads or two million if the coin lands tails.) Money cannot be simply combined with probabilities, so it may be surprising that there is a value that can be.

Proposition. If an agent is rational, then for every outcome o_i there is a real number u(o_i), called the utility of o_i, such that
o_i>o_j if and only if u(o_i) > u(o_j) and
utilities are linear with probabilities:
u([p₁:o₁, p₂:o₂, ..., p_k:o_k])=p₁ u(o₁)+ p₂ u(o₂)+ ...+ p_k u(o_k).

Proof. If the agent has no strict preferences (i.e., the agent is indifferent between all outcomes) then define u(o)=0 for all outcomes o.
Otherwise, choose the best outcome, o_best, and the worst outcome, o_worst, and define, for any outcome o, the utility of o to be the value p such that

o ∼[p:o_best, (1-p):o_worst].

The first part of the proposition follows from substitutability and monotonicity.

The second part can be proved by replacing each o_i by its equivalent lottery between o_best and o_worst. This composite lottery can be reduced to a single lottery between o_best and o_worst, with the utility given in the theorem. The details are left as an exercise.

In this proof the utilities are all in the range [0,1], but any linear scaling gives the same result. Sometimes [0,100] is a good scale to distinguish it from probabilities, and sometimes negative numbers are useful to use when the outcomes have costs. In general, a program should accept any scale that is intuitive to the user.

A linear relationship does not usually exist between money and utility, even when the outcomes have a monetary value. People often are risk averse when it comes to money. They would rather have $n in their hand than some randomized setup where they expect to receive $n but could possibly receive more or less.

Figure 9.2: Possible money-utility trade-off for a risk-averse agent

Example 9.1: Figure 9.2 shows a possible money-utility trade-off for a risk-averse agent. Risk aversion corresponds to a concave utility function.

This agent would rather have $300,000 than a 50% chance of getting either nothing or $1,000,000, but would prefer the gamble on the million dollars to $275,000. They would also require more than a 73% chance of winning a million dollars to prefer this gamble to half a million dollars.

Note that, for this utility function, u($999000) approx 0.9997. Thus, given this utility function, the person would be willing to pay $1,000 to eliminate a 0.03% chance of losing all of their money. This is why insurance companies exist. By paying the insurance company, say $600, the agent can change the lottery that is worth $999,000 to them into one worth $1,000,000 and the insurance companies expect to pay out, on average, about $300, and so expect to make $300. The insurance company can get its expected value by insuring enough houses. It is good for both parties.

As presented here, rationality does not impose any conditions on what the utility function looks like.

Figure 9.3: Possible money-utility trade-off from Example 9.2

Example 9.2: Figure 9.3 shows a possible money-utility trade-off for someone who really wants a toy worth $30, but who would also like one worth $20. Apart from these, money does not matter much to this agent. This agent is prepared to take risks to get what it wants. For example, if it had $29, it would be very happy to bet $19 of its own against a single dollar of another agent on a fair bet, such as a coin toss. It does not want more than $60, because this will leave it open to extortion.

Challenges to Expected Utility

There have been a number of challenges to the theory of expected utility. The Allais Paradox, presented in 1953 [Allais and Hagen (1979)], is as follows. Which would you prefer out of the following two alternatives?

A:: $1m - one million dollars
B:: lottery [0.10:$2.5m, 0.89:$1m, 0.01:$0]

Similarly, what would you choose between the following two alternatives?

C:: lottery [0.11:$1m,0.89:$0]
D:: lottery [0.10:$2.5m,0.9:$0]

It turns out that many people prefer A to B, and prefer D to C. This choice is inconsistent with the axioms of rationality. To see why, both choices can be put in the same form:

A,C:: lottery [0.11:$1m,0.89:X]
B,D:: lottery [0.10:$2.5m,0.01:$0,0.89:X]

In A and B, X is a million dollars. In C and D, X is zero dollars. Concentrating just on the parts of the alternatives that are different seems like an appropriate strategy, but people seem to have a preference for certainty.

Tversky and Kahneman (1974), in a series of human experiments, showed how people systematically deviate from utility theory. One such deviation is the framing effect of a problem's presentation. Consider the following:

A disease is expected to kill 600 people. Two alternative programs have been proposed:
Program A:
200 people will be saved
Program B:
with probability 1/3, 600 people will be saved, and with probability 2/3, no one will be saved

Which program would you favor?
A disease is expected to kill 600 people. Two alternative programs have been proposed:
Program C:
400 people will die
Program D:
with probability 1/3 no one will die, and with probability 2/3 600 will die

Which program would you favor?

Tversky and Kahneman showed that 72% of people in their experiments chose A over B, and 22% chose C over D. However, these are exactly the same choice, just described in a different way.

An alternative to expected utility is prospect theory, developed by Kahneman and Tversky, that takes into account an agent's current wealth at each time. That is, a decision is based on the agent's gains and losses, rather than the outcome. However, just because this better matches a human's choices does not mean it is the best for an artificial agent, but an artificial agent that must interact with humans should take into account how humans reason.

9.1.1 Factored Utility

[p:o₁, (1-p):[q:o₂, (1-q):o₃]]
	∼[p:o₁, (1-p)q: o₂, (1-p)(1-q):o₃].