next up previous
Next: Computational Complexity Up: Finite Horizon Previous: Summary

An Algorithm for Constructing an Optimal Policy

In this section we develop an optimal Markovian deterministic policy. As shown in the previous section, by this we achieve a general optimal policy. The algorithm construction is done from t=n to t=1 in a recursive manner.
The algorithm:
For $t \leftarrow N$, $\forall s_N, U_N(s_N) = r_N(s_N)$
For $t \leftarrow t-1$

\begin{displaymath}U_t(s_t) = \max_{ a \in A}\{r_t(s_t, a) + \sum_{j\in S} P_t(j\vert s_t,a)U_{t+1}(j)\}\end{displaymath}

and the optimal group of actions is:

\begin{displaymath}A_{s_t}^*(t) = argmax_{ a \in A} \{r_t(s_t,a) + \sum_{j\in S} P_t(j\vert s_t,a)U_{t+1}^*(j)\}\end{displaymath}

The generated policy, ${\pi}^*$, satisfying $\forall t, \forall s_t, {\pi}_t^*(s_t) \in A_{s_t}^*$ is an optimal policy.


Yishay Mansour