12.11 Review

The following are the main points you should have learned from this chapter:

•

A Markov decision process is an appropriate formalism for reinforcement learning. A common method is to learn an estimate of the value of doing each action in a state, as represented by the $Q(S,A)$ function.
•

In reinforcement learning, an agent should trade off exploiting its knowledge and exploring to improve its knowledge.
•

Off-policy learning, such as Q-learning, learns the value of the optimal policy. On-policy learning, such as SARSA, learns the value of the policy the agent is actually carrying out (which includes the exploration).
•

Model-based reinforcement learning separates learning the dynamics and reward models from the decision-theoretic planning of what to do given the models.

Artificial Intelligence 2E