next up previous
Next: About this document ... Up: Large State Space Previous: Approximate Value Iteration


Figure: Example 2 Diagram

We will show a MDP, where the approximate value iteration does not converge. Consider an approximation $\tilde{V}$ such that, All the rewards equal zero. Therefore V(1) = V(2) = 0

\begin{displaymath}\tilde{V}(1,r) = r\end{displaymath}

\begin{displaymath}\tilde{V}(2,r) = 2r \end{displaymath}

One can see that for r = 0 we have the value function. We calculate the square error, and choose the r that minimizes it.

\begin{displaymath}\min_r[\tilde{V}(1,r)- \lambda\tilde{V}(2,r)]^2 +\tilde{V}(2,r)- \lambda\tilde{V}(2,r)]^2 \end{displaymath}

In this simple case the minimum can be easily computed. We have,

\begin{displaymath}\min_r[(r-2\lambda r_k)^2+(2r-2\lambda r_k)^2]\end{displaymath}

The derivative is,

\begin{displaymath}2(r - 2\lambda r_k) + 4(2r-2\lambda r_k)\end{displaymath}

Hence, the minimum is at

\begin{displaymath}r = \frac{6}{5}\lambda r_k\end{displaymath}

Since $r_k = (\frac{6}{5}\lambda)^k$ for $\lambda > \frac{5}{6} r_k$ we have thatrk diverges. We have shown an example for a value function, which does not converge. We will look to see if our assumption was not satisfied

\begin{displaymath}\vert\vert V_{k+1} - LV_k\vert\vert = \max\{ \vert r_{k+1} -2...
...rt\} = \max\{\frac{6}{5}\lambda r_k, \frac{12}{5}\lambda r_k \}\end{displaymath}

The error is a function of rk and therefore we do not have an upper bound and the assumption is not satisfied.

Yishay Mansour