Algorithms -optimal control Example
A={+1,-1}
g = 1/2
d(si,a)= si+a
p random
s0
s1
s3
s2
R(si,a) = i
0
1
2
3
Changing the policy using the state-action value function.
Previous slide
Next slide
Back to first slide
View graphic version