next up previous
Next: Policy Up: Introduction Previous: Immediate reward and the

Policy and Decision Rules

Figure: model diagram

A decision rule may need memory of the whole history to determine the most profitable course of action, or it may need only the current state. It may also be a deterministic rule (resulting in one singe operation) or a stochastic one (resulting in a distribution on a set of operations). We will define the following set of rules:

MD - a deterministic Markovian rule (having no memory):

$d_{t}:S\rightarrow A_{s}$ , $d_{t}(s)\in{A_{s}}$

HD - a deterministic history dependent rule:

$d_{t}:H_{t}\rightarrow A_{s}$

Where we define the history as: H1=s , and $H_{t}=H_{t-1} \times A \times S$ , $H=\bigcup H_{t}$

MR - a stochastic Markovian rule:

$d_{t}:S\rightarrow P(A)$ ,where P(A) is the set of distributions on A.

HR - a stochastic history dependent rule:

$d_{t}:H_{t}\rightarrow P(A)$

We define a stationary rule to be a rule that is independent on time, i.e. $\forall{t}, d_{t}=d$, for some d.

SD - a deterministic, stationary rule.

SR - a stochastic, stationary rule.

Yishay Mansour