next up previous
Next: Convergence proof Up: Q-learning and SARSA algorithms Previous: remarks:


SARSA is on-line algorithm. In this algorithm keep two states the current state and the next state.
Hence its name "S(s),A(a),R(r),S(s'),A(a')" is derive from the fact that we use the current state (s), current action (a), current reward (r), next state (s') and next action (a'). We update Q with the difference between the next value function in s' and the current value function.

Figure 9.2: Algorithm for SARSA
\ \ \ \ ...
...end\ for}\ \-\\
{\small\bf end} \sc sarsa\-\\ 
\end{minipage}} .

Yishay Mansour