Next: Linear Programming Up: No Title Previous: Example: Running Policy Iteration

# A Comparison between VI and PI Algorithms

In this section we will compare the convergence rate of the VI and PI algorithms. It will be shown that, assuming that the two algorithms begin with the same approximated value, the PI algorithm converges faster to the optimized value.

Theorem 6.6   Let be the series of values created by the VI algorithm (where un+1=Lun) and let be the series of values created by PI algorithm. If u0=v0, then .

Proof:We will use induction to prove the theorem.
Induction Basis: We assume that u0=v0. v0 is the return value of a specific policy, and therefore it is clearly the optimal return value. Therefore: .
Induction Step:

From the induction hypothesis , and since Lpn is monotonic it follows that:

Since L is taking the maximum over all policies:

We denote the policy determined by PI algorithm in iteration nas dn and therefore: Lvn=Ldnvn

From the Optimality Equations we get:

From the definition of vn+1 we have:

And we get . In Theorem it was proven that and therefore .

From Theorem it follows that, assuming the same starting point, PI algorithm requires less stages then VI algorithm to converge to the optimal policy. Yet, it should be noticed that each single stage of PI requires a solution of a set of linear equations (the policy evaluation stage) and therefore it is computationally more expensive than a single stage of VI algorithm.

Next: Linear Programming Up: No Title Previous: Example: Running Policy Iteration
Yishay Mansour
1999-12-18