Policy Iteration

In this section we present the Policy Iteration algorithm (also referred to as PI) for finding the optimal policy in a discounted infinite horizon problem. As opposed to the Value Iteration algorithm, the output of PI is not an approximation of the optimal policy, but the optimal policy itself.


