14.10 References and Further Reading

For overviews of multiagent systems, see Leyton-Brown and Shoham [2008], Shoham and Leyton-Brown [2008], Wooldridge [2009], Vlassis [2007], Stone and Veloso [2000], and Jackson [2011]. See also Kochenderfer et al. [2022].

Multiagent decision networks are based on the MAIDs of Koller and Milch [2003]. Genesereth and Thielscher [2014] describe general game playing, which uses logical representations for games.

Minimax with αβ pruning was first published by Hart and Edwards [1961]. Knuth and Moore [1975] and Pearl [1984] analyze αβ pruning and other methods for searching game trees. Ballard [1983] discusses how minimax can be combined with chance nodes.

The Deep Blue chess computer, which beat Garry Kasparov, the world chess champion, in May 1997 is described by Campbell et al. [2002]. Silver et al. [2016] describe AlphaGo, the program that beat a top-ranked Go player in 2016. AlphaZero is by Silver et al. [2017]. Pluribus [Brown and Sandholm, 2019] beat top human professionals in six-player no-limit Texas hold’em poker, a popular form of poker; this is the only one of these superhuman players that was not playing a two-player zero-sum game. Cicero [Bakhtin et al., 2022], playing in an anonymous online blitz league for the game Diplomacy, achieved more than double the average score of human players and ranked in the top 10% of participants. Only one human player suspected it was a computer program.

Robot soccer was proposed, and implemented, as an embodied AI challenge by Mackworth [1993]. Busoniu et al. [2008] survey multiagent reinforcement learning.

Mechanism design is described by Shoham and Leyton-Brown [2008] and Nisan [2007]. Ordeshook [1986] describes group decision making and game theory.

Hardin [1968] introduced the concept of the tragedy of the commons. Ostrom [1990], on the other hand, showed that the commons can be, and is, governable.

Gal and Grosz [2022] describe technical and ethical challenges. Perrault et al. [2020] describe how multiagent systems are having a social impact in public safety and security, wildlife conservation and public health.

This chapter has only covered non-cooperative games, where agents make decisions in isolation, without coordinating their actions. It has not covered cooperative games, where agents can communicate, negotiate, and perhaps participate in payments and enforceable conflicts. Many of the above references cover cooperative games. Kramár et al. [2022] describe an AI system that learns to play the game Diplomacy, which requires negotiation to play well. The state space is enormous, as it includes the actions involved in negotiation. Siu et al. [2021] evaluated how well AI systems can cooperate with humans to play the cooperative card game Hanabi, and concluded “We find that humans have a clear preference toward a rule-based AI teammate (SmartBot) over a state-of-the-art learning-based AI teammate (Other-Play) across nearly all subjective metrics, and generally view the learning-based agent negatively, despite no statistical difference in the game score.”