Chapter 4
Dynamique Programming
Dynamique Programming:
- it refers to a collection of algorithm that can be used to compute optimal policies
- it needs a perfect model of the environment (like a MDP)
- it has a great computation cost
- it has limited utility in reinforcement learning because of the cost and the model requirement
- it is important theoretically
- it is a base for many algorithm which aims at reproducing the effect of DP
They key idea of DP, and of reinforcement learning, is the use of value functions to organize and structure the search for good policies.
Let's rewrite the and which satisfy the Bellman optimality equations:
or