foundations of computational agents
The following are the main points you should have learned from this chapter:
A Markov decision process is an appropriate formalism for reinforcement learning. A common method is to learn an estimate of the value of doing each action in a state, as represented by the function.
In reinforcement learning, an agent should trade off exploiting its knowledge and exploring to improve its knowledge.
Off-policy learning, such as Q-learning, learns the value of the optimal policy. On-policy learning, such as SARSA, learns the value of the policy the agent is actually carrying out (which includes the exploration).
Model-based reinforcement learning separates learning the dynamics and reward models from the decision-theoretic planning of what to do given the models.