A policy, at any given point in time, decides which action the agent selects. Let Dkt be all the options at time t, then we define $P_{i}^{k}=D^{k}_{1} \times ... \times D^{k}_{n-1}$. We also define $\Pi^{SD}$ and $\Pi^{DT}$ to be a stationary, random or deterministic rule, respectively.

Yishay Mansour