** Next:** Example: Running Policy Iteration
** Up:** Policy Iteration
** Previous:** Policy Iteration Algorithm

##

Convergence of Policy Iteration Algorithm

We shall
see that when the number of states - |S|, and the number of
actions - |A| are finite, there are no two
non-consecutive iterations with the same policy, unless we have an optimal policy.
Therefore *d*_{n} converges to the optimal policy
in a finite number of
steps. The key to the convergence of *d*_{n} is the monotonicity of .

**Claim 6.4**
Let

*v*_{n},

*v*_{n+1} be the values of consecutive iterations of the above
algorithm, then

.

**Proof:**Let *d*_{n+1} be the policy in the *policy improvement* step, then

Therefore:
.
Multiplying by
we get:

Note: The above claim is valid even when *S* or *A* are infinite.

**Theorem 6.5**
Let

*S* and

*A* be finite,
then the

*Policy Iteration* algorithm converges to the optimal
policy after at most |

*A*|

^{|S|} iterations.

**Proof:**Clearly,
.
According to claim , in each step
except for the
last step in which
*v*_{n+1} = *v*_{n}. Therefore no policy
can appear in two different iterations. Hence the number
of iterations

** Next:** Example: Running Policy Iteration
** Up:** Policy Iteration
** Previous:** Policy Iteration Algorithm
*Yishay Mansour*

*1999-12-18*