Assumptions
In this section we make the following
simplifying assumptions.
- 1.
- The immediate reward and the transition probability are stationary. Hence the functions r(s,a) and p(j|s,a) are identical for any time stop. One benefit is that the algorithm can have a finite input.
- 2.
- The immediate reward is bounded:
|r(s,a)|<M.
- 3.
- The discounted parameter is
- 4.
- The number of states and actions is finite.
Yishay Mansour
1999-11-24