Up: Finite Horizon
Previous: Finite Horizon
be a solution to the optimality equations then
- For any t,
Ut*(ht) depends on ht only through st
- There exist a Markovian deterministic optimal policy
Proof:We will use a reversed induction to prove (1).
Induction Step: We assume the validity of the induction hypothesis for any n,
and will prove the validity for t=n.
Note that the marked term depends merely on s and a. The entire term, therefore depends solely on s.
To prove (2), let
be a Markovian deterministic policy that sutisfies:
Since the policy's definition depends solely on st, namely the current state,
is a Markovian policy.