MDP model - action selection
Policy - mapping from states to actions
Fully Observable - can “see” the “entire” state.
AIM: Maximize the expected return.
Optimal policy: optimal from any start state.
THEOREM: There exists a deterministic optimal policy