Computational Complexity

Next: The Optimality Principal Up: Return Estimation of a Previous: Dynamic Programming Algorithm for

Computational Complexity

Let K be the number of states and L the number of actions. Then |H_t|=K^t+1 L^t and the computation time would be: $\sum_{t=0}^{N-1} K\vert H_t\vert \leq K^2 \sum_{t=0}^{N-1} (KL)^t$

The above value is for a general history dependent policy. If the policy is Markovian, then |H_t|=K and we get a computation time of K²(N-1)

Yishay Mansour
1999-11-15