4.6 Generalized Policy Iteration

  • Policy evaluation
    • making the value function consistent with the current policy
  • Policy improvement

    • making the policy greedy with respect to the current value function
  • Generalized policy iteration (GPI)

    • refer to the general idea of letting policy evaluation and policy improvement processes interact, independent of the granularity and other details of the two processes.
    • The evaluation and improvement processes can be viewed as both competing and cooperating
      • they compete because they pull in opposing directions
      • Making the policy greedy with respect to the value function makes the value function incorrect for the changed policy
      • making the value function consistent with the policy cause that policy no longer to be greedy
    • In the long run, these two processes interact to find a single joint solution
      • the optimal value function and an optimal policy

Figure 4.7

The two processes together achieve the overall goal of optimality even though neither is attempting to achieve it directly.