Algorithms - Policy Iterations

pi(s) := argmaxaeA {Qi-1 (s,a)}

1. Number of iterations less than Value Iterations.

2. Never considers the same policy twice, |A||S|.

Previous slide Next slide Back to first slide View graphic version