The Expected Discounted Sum Return Function

Next: Markovian Policy Up: Infinite Horizon Problems Previous: Example 1

The Expected Discounted Sum Return Function

Here are some possible explanations for the $\lambda$ parameter.

1.

In economical problems the $\lambda$ parameter may be interpreted as the interest

2.

Consider a finite horizon problem where the horizon is random, i.e.

$\begin{displaymath}V^{\pi}_N (s) = E^{\pi}_s E_N [\sum_{i=1}^N r(X_t,Y_t)]\end{displaymath}$

assuming that the final value of all the states is equal to 0.

Let N be distributed geometricly with parameter $\lambda$ . The probability to stop at the N^th step is
$Prob[N=n] = (1 - \lambda)\lambda^{n-1}$

Lemma 4.2 $V^{\pi}_N (s) = V^{\pi}_{\lambda} (s)$
Under the assumption that $\vert r(\cdot , \cdot)\vert < M$

Proof:

$\begin{eqnarray*}V^{\pi}_N (s) & = & E^{\pi}_s \{\sum_{n=1}^\infty [\sum_{t=1}^n... ...^\infty \lambda^{t-1} r(X_t,Y_t)]\\ & = & V^{\pi}_{\lambda} (s) \end{eqnarray*}$

$\Box$ we look back at example 1 we could add to it an additional state, $\Lambda$ , that behaves as a 'black hole', see figure

. Once the system reached this state it stays there forever, getting an immediate reward of value 0.
The probablity to move into state $\Lambda$ is $(1-\lambda)$ from any state. All other probabilities given in the original example are multiplied by $\lambda$ .

**Figure:** y

The sum of the immediate rewards from the new model is equal to the discounted sum of the immediate rewards from the original model.

Next: Markovian Policy Up: Infinite Horizon Problems Previous: Example 1

Yishay Mansour
1999-11-18