** Next:** About this document ...
** Up:** Finite Horizon
** Previous:** Computational Complexity

##

The Optimality Principal

Let us define:
.
For the simplicity of the discussion we assume that both *S* and *A* are finite.
The optimality equations (also known as Bellman equations) are:

Where
*U*_{N}(*h*_{N})=*r*_{N}(*s*_{N}) for
*h*_{N}=(*h*_{N-1},*a*_{N-1},*s*_{N}).

Note that replacing the max operation in the equation with the action taken by policy ,
we get the equation for computing the return of policy .

**Theorem 3.2**
Let

*U*_{t} be the solution for the optimality equations for

*t* = 1 up to

*N*-1, then:

(1)

(2)

**Proof:**First, we will show by induction that
for
and for every
.
Then we exhibit a specific policy
for which
and that this will conclude the proof.

*Basis*: When *t*=*N*, by definition
for every policy
and therefore
*u*_{N}(*h*_{N})=*u*^{*}_{N}(*h*_{N}).

*Induction Step*: Let us assume that
for
and
.
Let
be a policy in
(that performs the operation *d*_{n}' on step *n*). for *t*=*n*:

Consider an arbitrary policy ,
that for history *h*_{n} uses stochastic action *q*(*a*). By the induction assumption,

Therefore,
for every
and in particular to the optimal policy,
.
It is left to show that there exists a policy
for which
.
Let us define
in the following manner:

We will now show that
using reversed induction on *n*.

*Basis*: for *n*=*N* this is obviously true.

*Induction Step*:

The second equality is by the induction hypothesis, that
for
.
Therefore
and since
it is clear that
*u*_{n}(*h*_{n})=*u*^{*}_{n}(*h*_{n}) as was required.

** Next:** About this document ...
** Up:** Finite Horizon
** Previous:** Computational Complexity
*Yishay Mansour*

*1999-11-15*