Next: Problem of sampling Up: Evaluating One Policy With Previous: Importance Sampling

## Policy Sampling

The conclusion from the above equivalence is that if we can compute then we
able to "transform" samples from distribution D1 to samples in distribution D2.
We produce of policy . each sample(Tj) is a run on the model using policy , i.e. Tj = s1,a1,r1,s2,a2,r2...
The probability of generating Tj is actually a product of two independent probabilities: a policy depeneded probablity on actions and a model depended probability.

Prob[Tj] = =

We calculte the ratio of probabilites to have the same Tj on differnt policies:

The important fact is that the ratio does not depened on the model, but only on the policies. Therefor we can compute it with out the model.
EXAMPLE 2

Input:
• policy .
• policy is determinstic.
Computation:
• because is determinstic
• Therefor the ratio is a product of X,
• If is the random policy than we simply have a uniform distribution on the runs. Consistent with .

Yishay Mansour
2000-01-07