The Return Function
Therefore:
 1.

Since we proved in the last theorem that the dsitribution function is equal for
and
.
 2.

Since (1) is true for all N.
 3.

One should note that it is imposible to prove theorm for history dependent and Markovian deteministic policies, since the random property of the policy allows the modeling of all the histories under one state.
Yishay Mansour
19991118