next up previous
Next: Evaluation Of Approximate Policy Up: Approximate Policy Iteration Previous: The algorithm using Monte

Solving the Least-Squares Problem

Let $\tilde(S)$ be a set of representative states, M(s) the number of samples of $s\in \tilde{s}$, the mth such sampled is denoted by c(s,m) and r is the vector parameter upon which the following optimisation problem is solved.

\begin{displaymath}\min_{r}\sum_{s \in \tilde{S}}\sum_{m=1}^{M(s)}(\tilde{V}(s,r) - C(s,m) )\end{displaymath}

The solution can be obtained by an incremental algorithm, which performs steps in the gradient direction.We will have the following equation for a certain run (s1,a1,....,sn).

\begin{displaymath}\vec{r} = \vec{r} - \alpha\sum_{k=0}^{\vert{\tilde{S}}\vert}\nabla_{r}\tilde{V}(s,r)(\tilde{V}(s,r) - C(s,k) )\end{displaymath}

Yishay Mansour