Large scale MDP - Restricted Value Function
applications: most of the work (TD-gammon, etc.)
Vague Idea: reduce to a supervised learning.
(Value) Function Approximation:
Use a limited class of functions to estimate the value function.
Given a good approximation of the value function, we can
estimate the optimal policy.