Example:

Next: About this document ... Up: Computing the Optimal Policy Previous: Uniqueness of

Example:

Using the same example we calculate the optimal return value to be:

$\begin{eqnarray*}V(S_{1})& = & max\{5 + \lambda[\frac{1}{2}V(S_{1}) + \frac{1}{... ... 10 + \lambda V(S_{2})\} \\ V(S_{2})& = &-1 + \lambda V(S_{2}) \end{eqnarray*}$

Thus

$\begin{eqnarray*}V(S_{2}) & = & -\frac{1}{1 - \lambda}\\ V(S_{1})& = & max\{5 + ... ...}\frac{1}{1 - \lambda}],\ 10 - \lambda \frac{1}{1 - \lambda}\} \end{eqnarray*}$

If we examine different values of $\lambda$ we get different optimal actions in S₁.
For example:

$\begin{eqnarray*}\lambda = 0 \ &:&\ V(S_{1})^{*} = 10 \ \ \ V(S_{2})^{*} = -1\\... ... \frac{9}{10} \ &:&\ V(S_{1})^{*} = 1 \ \ \ V(S_{2})^{*} = -10 \end{eqnarray*}$

Note that as $\lambda$ increases the optimal policy at S₁ changes from a₁₂ to a₁₁.

Yishay Mansour
1999-11-24