Assumptions

Next: Calculating the Return Value Up: Introduction: Discounted Infinite Horizon Previous: The discounted value of

In this section we make the following simplifying assumptions.

1.: The immediate reward and the transition probability are stationary. Hence the functions r(s,a) and p(j|s,a) are identical for any time stop. One benefit is that the algorithm can have a finite input.
2.: The immediate reward is bounded: |r(s,a)|<M.
3.: The discounted parameter is $0 \leq\lambda<1$
4.: The number of states and actions is finite.

Yishay Mansour
1999-11-24