next up previous
Next: Evaluating the average Up: Stochastic Models Previous: Using to control the

Choosing $\alpha_{n}(s)$

Since :

$\vert r_{n}(s)-r_{0}(s)\vert \leq \sum_{i} \vert r_{i+1}(s)-r_{i}(s)\vert \leq ...
...s) \vert(Hr_{i})(s)-r_{i}(s)+W_{i}(s)\vert \leq A \sum_{i=1}^{n}
(where A is a constant that bounds
|(Hri)(s)-ri(s)+Wi(s)| ).

We need to require that
$\sum_{i} \alpha_{i}(s) = \infty$ : $ \forall s\in S$. Otherwise, we have to assume that the distance ||r* - r0|| is bounded. The above condition insures that any starting point converges to the optimal value.

We add a condition to insure that the converge rate is fast enough :

$\forall S\in S$ : $\sum_{i} \alpha_{i}^{2} < \infty $

One simple choice could be :
$\alpha_{i}(s) = \frac {1}{i}$

Yishay Mansour