Planning - Value Functions
Vp(s) The expected value starting at state s following p.
Qp(s,a) The expected value starting at state s with
action a and then following p.
V*(s) and Q*(s,a) are define using an optimal policy p*.
V*(s) = maxp Vp(s)
Previous slide
Next slide
Back to first slide
View graphic version