Next: The Value Iteration Algorithm Up: No Title Previous: No Title

# Finding the Optimal Policy: Value Iteration

In this section we present the Value Iteration algorithm (also referred to as VI) for computing an -optimal policy6.1 for a discounted infinite horizon problem.
In Lecture 5 we showed that the Optimality Equations for discounted infinite horizon problems are:

We also defined the non-linear operator L:

For which it was shown, that for any starting point , the series defined by , converges to the optimal return value .
The idea of VI is to use these results to compute a solution of the Optimality Equations. The VI algorithm finds a Markovian stationary policy that is -optimal.

Yishay Mansour
1999-12-18