MDP - Return function.
Combining all the immediate rewards to a single value.
Issues:
Are early rewards more valuable than later rewards?
Is the system “terminating” or continuous?
Usually the return is linear in the rewards.
Previous slide
Next slide
Back to first slide
View graphic version