Next: Policy Iteration Algorithm Up: No Title Previous: Example: Running Value Iteration

Policy Iteration

In this section we present the Policy Iteration algorithm (also referred to as PI) for finding the optimal policy in a discounted infinite horizon problem. As opposed to the Value Iteration algorithm, the output of PI is not an approximation of the optimal policy, but the optimal policy itself.

Policy Iteration Algorithm
Convergence of Policy Iteration Algorithm
Example: Running Policy Iteration Algorithm

Yishay Mansour
1999-12-18