next up previous
Next: Evaluating Policy Reward Up: Stochastic Models Previous: Choosing

   
Evaluating the average

Let V1,...,Vn be i.i.d. random variables with expected value of $\bar{v} = E[V_{i}]$.
We would like to solve the system
E[V]=r (this is a special case of the earlier discussion in which g(V,r)=V). We will use a single sample Vn and get :

$r_{n+1} \leftarrow (1-\alpha_{n})r_{n} + \alpha_{n} V_{n}$

if
$r_{n}\longrightarrow r^{*}$ then $\bar{v} = E[V] = r^{*}$.

We compute the variance as follows :

$Var(r_{n+1}) = (1-\alpha_{n})^{2}Var(r_{n}) + \alpha_{n}^{2}Var(V_{n})$.

We show that if
$\sum_{n} \alpha_{n} = \infty$ and $\alpha_{n}
\longrightarrow 0$ then $Var(r_{n+1}) \longrightarrow 0$. The following is a general Lemma that is used and is shown later.

Lemma 7.1   Let $\delta_{t}$ be a series such that $\delta_{t} \longrightarrow 0, %
\sum_{t} \delta_{t} = \infty, 0 \leq \delta_{t} \leq 1$. and Let et be a series such that $e_{t} \geq 0$ and : $e_{t+1} \leq (1-\delta_{t})e_{t} + c\delta_{t}^{2}$. Then $e_{t} \longrightarrow 0$.

Proof:step 1: we show that for every constant $\epsilon$, $e_{t}<\epsilon$ for an infinite number of t's.
Let us assume (by contradiction) that there exists
T such that for all t>T, $e_{t} > \epsilon$ and $c\delta_{t} \leq \epsilon/2$. (since $\delta_{t} \longrightarrow 0$). In such a case we have

$e_{t+1} \leq (1-\delta_{t})e_{t} + c\delta_{t}^{2} =
e_{t} - \delta_{t}e_{t} +...
..._{t} - \delta_{t}\epsilon + \delta_{t}\epsilon/2 =
e_{t} - \delta_{t}\epsilon/2$.

Hence :

$e_{m} \leq e_{T} - \epsilon/2 * \sum_{t=1}^{m-1}\delta_{t}$.

Since for any m we have
$e_{m} \geq 0$ ,then it has to be the case that $\sum_{i=0}^{\infty} \delta_{t} < \infty$, which is a contradiction to the assumption in the lemma.
Therefore, for any
$\epsilon > 0$, there exists T such that $\forall t>T :
c\delta_{t} < \epsilon$ and $e_{T} \leq \epsilon$.

step 2: we show that
$\forall t>T : e_{t} \leq \epsilon$.
$e_{T+1} \leq (1-\delta_{T})e_{T} + c\delta_{T}^{2} \leq
(1-\delta_{T})\epsilon + \epsilon \delta_{T} \leq
\epsilon$

Under the hypothesis that
$e_{T} \leq \epsilon$ we showed that $e_{T+1} \leq \epsilon$, this implies that $\forall t>T : e_{t} \leq \epsilon$.$\Box$

So the variance could be expressed as :

$Var(r_{n+1}) = (1-\alpha_{n})^{2}Var(r_{n}) + \alpha_{n}^{2}Var(V_{n}) \leq
(1-\alpha_{n})Var(r_{n}) + \alpha_{n}^{2}Var(V_{n})$

we can substitute :

  • $ \alpha_{n} \equiv \delta_{n}$
  • $C \equiv Var(V_{n})$
  • $e_{n} \equiv Var(r_{n})$

Since $\sum_{n} \alpha_{n} = \infty$ and $\alpha_{n}
\longrightarrow 0$, we have that $Var(V_{n}) \longrightarrow 0$.

Remark : the condition
$\sum_{n} \alpha_{n}^{2} \longrightarrow 0$ will insure convergence with probability one.

We say that
$\lim_{n \rightarrow \infty}X_{n}=X$ with probability one if and only if $lim_{n \rightarrow \infty}\sup_{n \leq m}\vert X_{m}=X\vert=0$.


next up previous
Next: Evaluating Policy Reward Up: Stochastic Models Previous: Choosing
Yishay Mansour
1999-12-16