Approximate Value Iteration

Next: Example Up: Large State Space Previous: Evaluation Of Approximate Policy

We will make the following assumption.

$\begin{displaymath}\vert\vert V_{k+1} - LV_k\vert\vert \leq \epsilon\end{displaymath}$

$\begin{displaymath}LV_0 - \epsilon*\vec{1} \leq V_1 \leq LV_0 - \epsilon*\vec{1} \end{displaymath}$

This implies that using L operator on the inequality

$\begin{displaymath}LV_0^2 - \lambda \epsilon*\vec{1} \leq LV_1 \leq LV_0^2 -\lambda \epsilon*\vec{1} \end{displaymath}$

We have also the next inequality

$\begin{displaymath}LV_1 - \epsilon*\vec{1} \leq V_2 \leq LV_1 - \epsilon*\vec{1} \end{displaymath}$

Using both inequalities

$\begin{displaymath}LV_0^2 - (\lambda \epsilon + \epsilon)*\vec{1} \leq LV_1 \leq LV_0^2 -(\lambda \epsilon+\epsilon)*\vec{1} \end{displaymath}$

Thus for each k we have

$\begin{displaymath}\vert\vert V_k - L^kV_0\vert\vert \leq \epsilon \sum_{i=0}^{k-1}\lambda^i \leq \frac{\epsilon}{1-\lambda}\end{displaymath}$

If we look at Therefore,

$\begin{displaymath}\vert\vert\tilde{V} - V^*\vert\vert \leq \frac{\epsilon}{1-\lambda}.\end{displaymath}$

Although calculations are much simpler than in PI. The method is less natural.

Yishay Mansour
2000-01-11