Contrast with Supervised Learning

Fixed distribution on examples.

The distribution is policy dependent!!!

A small local change in the policy can make a huge global change in the return.

Previous slide Next slide Back to first slide View graphic version