Next: Calculating the Return Value
Up: Introduction: Discounted Infinite Horizon
Previous: The discounted value of
In this section we make the following
- The immediate reward and the transition probability are stationary. Hence the functions r(s,a) and p(j|s,a) are identical for any time stop. One benefit is that the algorithm can have a finite input.
- The immediate reward is bounded:
- The discounted parameter is
- The number of states and actions is finite.