The Return Function

Next: About this document ... Up: Markovian Policy Previous: Markovian Policy

The Return Function

$\begin{eqnarray*}V_N^{\pi}(s) & = & \sum_{t=1}^{N-1}\sum_{j \in S}\sum_{a \in A_... ...n S}\sum_{a \in A_j}[r_N(j)]\cdot Prob[X_N=j, Y_N=a \vert X_1=s] \end{eqnarray*}$

Therefore:

1.: $\forall N\; V_N^{\pi}(s) = V_N^{\pi^{'}}(s)$
Since we proved in the last theorem that the dsitribution function is equal for $\pi$ and $\pi^{'}$ .
2.: $g^{\pi}(s) = g^{\pi^{'}}(s)$
Since (1) is true for all N.
3.: $V_{\lambda}^{\pi}(s) = V_{\lambda}^{\pi^{'}}(s)$

One should note that it is imposible to prove theorm

for history dependent and Markovian deteministic policies, since the random property of the policy allows the modeling of all the histories under one state.

Yishay Mansour
1999-11-18