4.1 Policy Evaluation

Policy evaluation:

  • is the probability of taking action in state under policy o- The existence and uniqueness of are guaranteed as long as either or eventual termination is guaranteed from all states under the policy

To find the solution to this equation we will do an iterative methods. We can considere a sequence of appromixate The initial value is chosen arbitrarily. Each successive approximation is obtained by using the Bellman equation vor as an update rule.

is a fixed point for this update rule. This algorithm is called iterative policy evaluation.

  • full backup
    • replace the value of the old state by a new value and this at each step
  • backup done in a sweep
    • modifying the new vector directly without passing by a temporary vector

Figure 4.1: Iterative policy evaluation

To stop the algorithm (since it converges in the limit) we test the quantity . When it is sufficiently small we stop the loop.