** Next:** Summary
** Up:** Finite Horizon
** Previous:** Finite Horizon

##

Markovian Policy

**Theorem 4.1**
Let

*U*_{t}^{*} be a solution to the optimality equations then

- 1.
- For any
*t*,
*U*_{t}^{*}(*h*_{t}) depends on *h*_{t} only through *s*_{t} - 2.
- There exist a Markovian deterministic optimal policy

**Proof:**We will use a reversed induction to prove (1).

*Basis*:
*U*_{N}^{*}(*h*_{N})=*r*_{N}(*s*_{N}), therefore
*U*_{N}^{*}(*h*_{N})=*U*_{N}^{*}(*s*_{N})

*Induction Step*: We assume the validity of the induction hypothesis for any *n*,
and will prove the validity for *t*=*n*.

Note that the marked term depends merely on *s* and *a*. The entire term, therefore depends solely on *s*.

Thus,
*U*_{t}^{*}(*h*_{t})=*U*_{t}^{*}(*s*_{t})

To prove (2), let
be a Markovian deterministic policy that sutisfies:

Since the policy's definition depends solely on *s*_{t}, namely the current state,
is a Markovian policy.

*Yishay Mansour*

*1999-11-18*