SARSA

Next: Convergence proof Up: Q-learning and SARSA algorithms Previous: remarks:

SARSA

SARSA is on-line algorithm. In this algorithm keep two states the current state and the next state.
Hence its name "S(s),A(a),R(r),S(s^'),A(a^')" is derive from the fact that we use the current state (s), current action (a), current reward (r), next state (s^') and next action (a^'). We update Q with the difference between the next value function in s^' and the current value function.

**Figure 9.2:** Algorithm for SARSA
$\framebox[\textwidth]{ \begin{minipage}{\textwidth} \begin{tabbing} \ \ \ \ ... ...end\ for}\ \-\\ {\small\bf end} \sc sarsa\-\\ \end{tabbing} \end{minipage}}$ .

Yishay Mansour
2000-01-07