Large State Space

be the minimal distance between and

Theses bounds might seem disappointing, since when our bound diverges. On the other hand if we enrich our architecture (enlarge the family of

If we approximate

If we have , then the greedy policy is,

We have to approximate , so we have additional errors due to approximation too.

If we have only a few states

and is a greedy policy based on V, then

Furthermore there exists s.t for every the policy is the optimal policy.

**Proof:**

Consider the operators,

and

Then

This implies that,

For the second part,

Since we have finite number of policies, then there exist s.t.

For any such that,

we have that

Therefore we can choose

- Example showing that tied bound is right
- Approximate Policy Iteration
- Approximate Value Iteration
- Example