Algorithms - Policy Iterations
Qi := Qp (s,a)
pi(s) := argmaxaeA {Qi-1 (s,a)}
Convergence:
1. Number of iterations less than Value Iterations.
2. Never considers the same policy twice, |A||S|.
RECENT: number of iterations is O(|A||S|/ |S|) [MS, UAI99].
Previous slide
Next slide
Back to first slide
View graphic version