** Next:** Policy and Decision Rules
** Up:** Introduction
** Previous:** states and actions

##

Immediate reward and the probability of transition

As a result of performing an action
in state
at time t:
1. The agent is rewarded an immediate reward
*R*_{t}(*s*,*a*). We define the expectation of *R*_{t} as
*r*_{t}(*s*,*a*)=*E*[*R*_{t}(*s*,*a*)].

2.The system transfers to a new state *s*', determined according to transition probability
*P*_{t}(*s*'|*s*,*a*).
We assume that *P*_{t} is well defined, that is, that for every
and ,
.

We will not discuss how or when the immediate reward reach the agent. It may be accumulated in the time frame [*t*, *t*+1], or alternatively, it can be given in
a single point in time between *t* and *t*+1. In any case all that matters to the agent is that the immediate reward reaches it before t+1.

A Markovian process is define as a process in which the only information needed from history is the current state. We define a Markovian process as (T, S, A,
*P*_{t}(.|*s*,*a*),
*R*_{t}(*s*,*a*)). The process defined here is a Markovian one since the following states and immediate rewards (and therefore the whole continuation of the process) depends only on the current state and operation chosen, and not on the history.

** Next:** Policy and Decision Rules
** Up:** Introduction
** Previous:** states and actions
*Yishay Mansour*

*1999-11-15*