Learning - Model freeMonte Carlo - Policy Evaluation
Given a policy p :
Generate m trajectories following p.
For each state s let T(s) include all the suffixes of trajectories starting at s.
^
Let Vp(s) be the average return of T(s).
Previous slide
Next slide
Back to first slide
View graphic version