Direct Algorithm - Phased-Q-Learning

Next: Indirect Algorithm Up: The Learning Algorithms Previous: The Learning Algorithms

Direct Algorithm - Phased-Q-Learning

The Phased-Q-Learning algorithm is similar to the Q-Learning algorithm we've encountered in class, only it works in phases. In each phase the algorithm makes m_D calls to PS(M) (where m_D is determined later by the analysis). The algorithm uses the m_D samples of every state-action pair collected by the m_D calls to PS(M) to update the value function as follows:

$\begin{eqnarray*}\forall s,a: \widehat{Q}_0(s,a) &=& \widehat{v}_0(s) = 0\\ \... ...ehat{v}_l(s) = \max_{a \in A} \left\{ \widehat{Q}(s,a) \right\} \end{eqnarray*}$

Note that the Phased-Q-Learning algorithm requires $l_D \times m_D$ calls to PS(M), where l_D is the number of performed phases.

Yishay Mansour
2000-05-30