next up previous
Next: About this document ... Up: Computing the Optimal Policy Previous: Uniqueness of

   
Example:

Using the same example we calculate the optimal return value to be:

\begin{eqnarray*}V(S_{1})& = & max\{5 + \lambda[\frac{1}{2}V(S_{1}) +
\frac{1}{...
... 10 + \lambda V(S_{2})\} \\ V(S_{2})& = &-1 +
\lambda V(S_{2})
\end{eqnarray*}


Thus

\begin{eqnarray*}V(S_{2}) & = & -\frac{1}{1 - \lambda}\\ V(S_{1})& = & max\{5 +
...
...}\frac{1}{1 - \lambda}],\
10 - \lambda \frac{1}{1 - \lambda}\}
\end{eqnarray*}


If we examine different values of $\lambda$ we get different optimal actions in S1.
For example:

\begin{eqnarray*}\lambda = 0 \ &:&\ V(S_{1})^{*} = 10 \ \ \ V(S_{2})^{*}
= -1\\...
... \frac{9}{10} \ &:&\ V(S_{1})^{*} =
1 \ \ \ V(S_{2})^{*} = -10
\end{eqnarray*}


Note that as $\lambda$ increases the optimal policy at S1 changes from a12 to a11.

Yishay Mansour
1999-11-24