Learning - Model Based
Estimate the model from the observation.
(Both transition probability and rewards.)
Use the estimated model as the true model,
and find optimal policy (using dynamic programming).
If we have a “good” estimated model, we should
have a “good” estimation.