Example

Next: About this document ... Up: Large State Space Previous: Approximate Value Iteration

Example

**Figure:** Example 2 Diagram
$\begin{figure}\psfig{file=example2.ps,width=4in,clip=} \end{figure}$

We will show a MDP, where the approximate value iteration does not converge. Consider an approximation $\tilde{V}$ such that, All the rewards equal zero. Therefore V(1) = V(2) = 0

$\begin{displaymath}\tilde{V}(1,r) = r\end{displaymath}$

$\begin{displaymath}\tilde{V}(2,r) = 2r \end{displaymath}$

One can see that for r = 0 we have the value function. We calculate the square error, and choose the r that minimizes it.

$\begin{displaymath}\min_r[\tilde{V}(1,r)- \lambda\tilde{V}(2,r)]^2 +\tilde{V}(2,r)- \lambda\tilde{V}(2,r)]^2 \end{displaymath}$

In this simple case the minimum can be easily computed. We have,

$\begin{displaymath}\min_r[(r-2\lambda r_k)^2+(2r-2\lambda r_k)^2]\end{displaymath}$

The derivative is,

$\begin{displaymath}2(r - 2\lambda r_k) + 4(2r-2\lambda r_k)\end{displaymath}$

Hence, the minimum is at

$\begin{displaymath}r = \frac{6}{5}\lambda r_k\end{displaymath}$

Since $r_k = (\frac{6}{5}\lambda)^k$ for $\lambda > \frac{5}{6} r_k$ we have thatr_k diverges. We have shown an example for a value function, which does not converge. We will look to see if our assumption was not satisfied

$\begin{displaymath}\vert\vert V_{k+1} - LV_k\vert\vert = \max\{ \vert r_{k+1} -2... ...rt\} = \max\{\frac{6}{5}\lambda r_k, \frac{12}{5}\lambda r_k \}\end{displaymath}$

The error is a function of r_k and therefore we do not have an upper bound and the assumption is not satisfied.

Yishay Mansour
2000-01-11