** Next:** The Optimality Principal
** Up:** Return Estimation of a
** Previous:** Dynamic Programming Algorithm for

###

Computational Complexity

Let *K* be the number of states and *L* the number of actions. Then
|*H*_{t}|=*K*^{t+1} *L*^{t} and the computation time would be:

The above value is for a general history dependent policy. If the policy is Markovian, then |*H*_{t}|=*K* and we get a computation time of *K*^{2}(*N*-1)

*Yishay Mansour*

*1999-11-15*