next up previous
Next: Example Up: Large State Space Previous: Evaluation Of Approximate Policy

Approximate Value Iteration

We will make the following assumption.

\begin{displaymath}\vert\vert V_{k+1} - LV_k\vert\vert \leq \epsilon\end{displaymath}

\begin{displaymath}LV_0 - \epsilon*\vec{1} \leq V_1 \leq LV_0 - \epsilon*\vec{1} \end{displaymath}

This implies that using L operator on the inequality

\begin{displaymath}LV_0^2 - \lambda \epsilon*\vec{1} \leq LV_1 \leq LV_0^2 -\lambda \epsilon*\vec{1} \end{displaymath}

We have also the next inequality

\begin{displaymath}LV_1 - \epsilon*\vec{1} \leq V_2 \leq LV_1 - \epsilon*\vec{1} \end{displaymath}

Using both inequalities

\begin{displaymath}LV_0^2 - (\lambda \epsilon + \epsilon)*\vec{1} \leq LV_1 \leq LV_0^2 -(\lambda \epsilon+\epsilon)*\vec{1} \end{displaymath}

Thus for each k we have

\begin{displaymath}\vert\vert V_k - L^kV_0\vert\vert \leq \epsilon \sum_{i=0}^{k-1}\lambda^i \leq \frac{\epsilon}{1-\lambda}\end{displaymath}

If we look at Therefore,

\begin{displaymath}\vert\vert\tilde{V} - V^*\vert\vert \leq \frac{\epsilon}{1-\lambda}.\end{displaymath}

Although calculations are much simpler than in PI. The method is less natural.

Yishay Mansour