next up previous
Next: Return Estimation of a Up: Finite Horizon Previous: Finite Horizon

Expected Total Reward

$V^{\pi}_N(s)=E^{\pi}_s[\sum_{t=1}^{N-1} R_t (X_t,Y_t)+R_n(X_N)]$ where,

Xt- State in time t

Yt- Action in time t

Under the assumption that M>|rt(s,a)| for each $t\leq N$ and that A and S are discrete then $V^{\pi}_{n}(s)$ is well defined for all $\pi\in\Pi^{HR}$. ($\Pi^{HR}$ is the group of stochastic, history-depended decision rules). Namely, for each decision rule there is a well defined value.

Yishay Mansour