Algorithms - optimal control
State-Action Value function:
Note
Qp(s,a) = E [ R(s,a) + g Vp(s’)]
p is deterministic.
Previous slide
Next slide
Back to first slide
View graphic version