Next: The Learning Algorithms Up: Reinforcement Learning - Final Previous: Notations

The Parallel Sampling Model

The $Parallel\ Sampling\ Model$ is a model of an ideal exploration policy, in the sense that every state-action pair is sampled with the same frequency. We define a subroutine called PS(M) that simulates the Parallel Sampling Model as follows: a single call to PS(M) returns, for every state-action pair (s,a), a random next state s' distributed according to the transition distribution P^a_sj. Thus a single call to PS(M) is simulating $\vert S\vert \times \vert A\vert$ transitions in the given MDP, and returns $\vert S\vert \times \vert A\vert$ next states.
Of course often this model is only an ideal and not an applicable procedure; the advantage of using such a model of perfect sampling method is that it enables us to separate the analysis of the learning algorithms from the quality of the policy we can sample. Using the analysis for the ideal exploration policy we can also bound the learning rate of any given exploration policy that visits every state-action pair infinitely often, by approximating the procedure PS(M) with taking enough samples of the exploration function, such that with good probability every state-action pair is visited.

Yishay Mansour
2000-05-30