Example 1
This example is an expantion of exmaple 2 given in lecture 3.
We will first examine the value gathered from different return functions using two specific policies:
 1.
  always chooses a_{11} when in state s_{1}
 2.

 always chooses a_{12} when in state s_{1}
Figure:
Infinite Horizon Example

Let us start by calculating
:
For
the gap between the two policies goes to 1 in favor of
The three suggested return functions evaluate to:
 1.
 The expected sum of the immediate rewards:
 2.
 Expected average reward:
 3.
 Expected discounted sum:
Yishay Mansour
19991118