** Next:** About this document ...
** Up:** Linear Programming
** Previous:** A Linear Programming Example

##

Use of
Linear Programming to Find the Optimal Policy

The general
translation of the problem
to a linear
program is as follows:

*Minimize: **x**
*

Subject To:

It was shown that the Optimality Equations of the discounted
infinite horizon problem are:

Using the above translation we would get the following linear
program:

*Minimize: *
*
*

Subject To:
is any set of constants with the following
characteristics:
These constants can be viewed as the probability
distribution of the agent's initial state in the MDP.

Accordingly, the dual linear program would be:

*Maximize: *
*
*

Subject To:
Intuitively,
can be thought of as the
probability of being in state *s* and performing action *a*, while taking into account
the discount factor - .
In the following theorem
is defined more formally.

Theorem is assuring that every solution to the dual linear
program can be translated to a policy with a return value equal to
the maximized element in the program. Thus, the best solution of
the program can be translated to a policy with the maximal
possible return value. This policy will, obviously, be the best
policy.

** Next:** About this document ...
** Up:** Linear Programming
** Previous:** A Linear Programming Example
*Yishay Mansour*

*1999-12-18*