Finding the Optimal Policy: Value Iteration

In Lecture 5 we showed that the Optimality Equations for discounted infinite horizon problems are:

We also defined the non-linear operator

For which it was shown, that for any starting point , the series defined by , converges to the optimal return value .

The idea of VI is to use these results to compute a solution of the Optimality Equations. The VI algorithm finds a Markovian stationary policy that is -optimal.

- The Value Iteration Algorithm
- Correctness of Value Iteration Algorithm
- Convergence of Value Iteration Algorithm
- Example: Running Value Iteration Algorithm