** Next:** Convergence proof
** Up:** Q-learning and SARSA algorithms
** Previous:** remarks:

##

SARSA

SARSA is on-line algorithm. In this algorithm keep two states the current state and the next
state.

Hence its name "S(s),A(a),R(r),S(*s*^{'}),A(*a*^{'})" is derive from the fact that we use
the current state (s), current action (a), current reward (r), next state (*s*^{'}) and next
action (*a*^{'}). We update Q with the difference between the next value function in *s*^{'}
and the current value function.

**Figure 9.2:**
Algorithm for SARSA
. |

*Yishay Mansour*

*2000-01-07*