next up previous
Next: Markovian Policy Up: Infinite Horizon Problems Previous: Example 1

   
The Expected Discounted Sum Return Function

Here are some possible explanations for the $\lambda$ parameter.
1.
In economical problems the $\lambda$ parameter may be interpreted as the interest
2.
Consider a finite horizon problem where the horizon is random, i.e.

\begin{displaymath}V^{\pi}_N (s) = E^{\pi}_s E_N [\sum_{i=1}^N r(X_t,Y_t)]\end{displaymath}

assuming that the final value of all the states is equal to 0.

Let N be distributed geometricly with parameter $\lambda$. The probability to stop at the Nth step is
$Prob[N=n] = (1 - \lambda)\lambda^{n-1}$

Lemma 4.2   $V^{\pi}_N (s) = V^{\pi}_{\lambda} (s)$
Under the assumption that $\vert r(\cdot , \cdot)\vert < M$

Proof:

\begin{eqnarray*}V^{\pi}_N (s) & = & E^{\pi}_s \{\sum_{n=1}^\infty [\sum_{t=1}^n...
...^\infty \lambda^{t-1} r(X_t,Y_t)]\\
& = & V^{\pi}_{\lambda} (s)
\end{eqnarray*}


$\Box$we look back at example 1 we could add to it an additional state, $\Lambda$, that behaves as a 'black hole', see figure [*]. Once the system reached this state it stays there forever, getting an immediate reward of value 0.
The probablity to move into state $\Lambda$ is $(1-\lambda)$ from any state. All other probabilities given in the original example are multiplied by $\lambda$.

  
Figure: y

The sum of the immediate rewards from the new model is equal to the discounted sum of the immediate rewards from the original model.

next up previous
Next: Markovian Policy Up: Infinite Horizon Problems Previous: Example 1
Yishay Mansour
1999-11-18