Next: Evaluation Of Approximate Policy
Up: Approximate Policy Iteration
Previous: The algorithm using Monte
be a set of representative states, M(s) the number of samples of
the mth such sampled is denoted by c(s,m) and r is the vector parameter upon which the following optimisation problem is solved.
Solving the Least-Squares Problem
The solution can be obtained by an incremental algorithm, which performs steps in the gradient direction.We will have the following equation for a certain run