Policy Iteration

In this section we present the Policy Iteration algorithm
(also referred to as PI) for finding the
optimal policy in a discounted infinite horizon
problem. As opposed to the Value Iteration algorithm, the
output of PI is not an approximation of the optimal policy,
but the optimal policy itself.

*Yishay Mansour*

*1999-12-18*