Next: Q-learning and SARSA algorithms
Up: Policy Sampling
Previous: Problem of sampling
- To calculate the ratio we don't need any knowledge on the Model
only about the two policies we use.
- The ratio is
is the case of Importance Sampling.
- 1+2 imply we can use samples from one policy to calculate samples on another policy.
- conclusion 3 explain why Q-learning can work.
- The Variance must be limited to avoid errors.