Simple example: N- armed bandit
Single state.
a1
a2
a3
s
Goal: Maximize sum of immediate rewards.
Difficulty: unknown model.
Given the model: Greedy action.
Previous slide
Next slide
Back to first slide
View graphic version