Learning - Model Based
Sample size (optimal policy):
Naive: O(|S|2 |A|) samples.
(approximates each transition d(s,a,s’) well.)
Better: O(|S| |A| log (|S| |A|) ) samples.
(Sufficient to approximate optimal policy.)
[KS, NIPS’98]
Previous slide
Next slide
Back to first slide
View graphic version