** Next:** The Naive Approach
** Up:** No Title
** Previous:** Evaluating the average

#

Evaluating Policy Reward

*
Let ** be a policy.
We would like to calculate the reward of policy ** : *
*.
*

For simplicity, we assume that there exist a state *s*_{0}* in the MDP, such that **s*_{0}* has a reward of 0, and *
*Prob*(*s*_{0}|*s*_{0},*a*) = 1* : *
*.
*

Also, we assume that each policy reachs state *s*_{0}* within a finite number of steps with probability 1.
*

Under these assumptions, we can assume each run is finite.
*
*

*
*

*Yishay Mansour*

*1999-12-16*