Algorithms - optimal control
The greedy policy with respect to Qp(s,a) is
p(s) = argmaxa{Qp(s,a) }
The e-greedy policy with respect to Qp(s,a) is
p(s) = argmaxa{Qp(s,a) } with probability 1-e, and
p(s) = a with probability e/|A|
Previous slide
Next slide
Back to first slide
View graphic version