Generalized Policy Iteration

Policy evaluation
- making the value function consistent with the current policy
Policy improvement
- making the policy greedy with respect to the current value function
Generalized policy iteration (GPI)
- refer to the general idea of letting policy evaluation and policy improvement processes interact, independent of the granularity and other details of the two processes.
- The evaluation and improvement processes can be viewed as both competing and cooperating
  - they compete because they pull in opposing directions
  - Making the policy greedy with respect to the value function makes the value function incorrect for the changed policy
  - making the value function consistent with the policy cause that policy no longer to be greedy
- In the long run, these two processes interact to find a single joint solution
  - the optimal value function and an optimal policy

Figure 4.7

The two processes together achieve the overall goal of optimality even though neither is attempting to achieve it directly.