Expected Total Reward

Next: Return Estimation of a Up: Finite Horizon Previous: Finite Horizon

Expected Total Reward

$V^{\pi}_N(s)=E^{\pi}_s[\sum_{t=1}^{N-1} R_t (X_t,Y_t)+R_n(X_N)]$ where,

X_t- State in time t

Y_t- Action in time t

Under the assumption that M>|r_t(s,a)| for each $t\leq N$ and that A and S are discrete then $V^{\pi}_{n}(s)$ is well defined for all $\pi\in\Pi^{HR}$ . ( $\Pi^{HR}$ is the group of stochastic, history-depended decision rules). Namely, for each decision rule there is a well defined value.

Yishay Mansour
1999-11-15