next up previous
Next: Indirect Algorithm Up: The Learning Algorithms Previous: The Learning Algorithms

   
Direct Algorithm - Phased-Q-Learning

The Phased-Q-Learning algorithm is similar to the Q-Learning algorithm we've encountered in class, only it works in phases. In each phase the algorithm makes mD calls to PS(M) (where mD is determined later by the analysis). The algorithm uses the mD samples of every state-action pair collected by the mD calls to PS(M) to update the value function as follows:

\begin{eqnarray*}\forall s,a: \widehat{Q}_0(s,a) &=& \widehat{v}_0(s) = 0\\
\...
...ehat{v}_l(s) = \max_{a \in A} \left\{ \widehat{Q}(s,a) \right\}
\end{eqnarray*}


Note that the Phased-Q-Learning algorithm requires $l_D \times
m_D$ calls to PS(M), where lD is the number of performed phases.

Yishay Mansour
2000-05-30