** Next:** Policy Iteration Algorithm
** Up:** No Title
** Previous:** Example: Running Value Iteration

#

Policy Iteration

In this section we present the Policy Iteration algorithm
(also referred to as PI) for finding the
optimal policy in a discounted infinite horizon
problem. As opposed to the Value Iteration algorithm, the
output of PI is not an approximation of the optimal policy,
but the optimal policy itself.

*Yishay Mansour*

*1999-12-18*