The algorithm using Monte Carlo method
 Since we have too many states, lets take only subset of the states  .

,
there are M(s) runs :
c( s,1 ) ... c( s,M(s)).
 We look for r, which minimizes,
Figure:
Diagram for a mechanism that produces Approximate Policy Iteration

Yishay Mansour
20000111