** Next:** Problem of sampling
** Up:** Evaluating One Policy With
** Previous:** Importance Sampling

##

Policy Sampling

The conclusion from the above equivalence is that if we can compute
then we

able to "transform"
samples from distribution *D*_{1} to samples in distribution *D*_{2}.

We produce
of policy .
each sample(*T*_{j}) is a run on the model using policy ,
i.e.
*T*_{j} = *s*_{1},*a*_{1},*r*_{1},*s*_{2},*a*_{2},*r*_{2}...

The probability of generating *T*_{j} is actually a product of two independent
probabilities:
a policy depeneded probablity on actions and a model depended probability.

*Prob*[*T*_{j}] =
=

We calculte the ratio of probabilites to have the same *T*_{j} on differnt policies:

The important fact is that the ratio does __not__ depened on the model, but only on
the policies. Therefor we can compute it with out the model.

__EXAMPLE 2__

Input:
- policy
.
- policy
is determinstic.

Computation:

- because
is determinstic
- Therefor the ratio is a product of X,
- If
is the random policy than we simply have a uniform distribution on the
runs. Consistent with .

*Yishay Mansour*

*2000-01-07*