Consider a multiplicative model, where the return function
of the reward sequence
Develop an algorithm that evaluates a given deterministic Markovian
policy for such a return function.
Suppose we have two actions. Action D has always reward 0,
while action R has reward +1 with probability 1/2 and -1 with
probability 1/2. Our aim is to maximize the probability that the sum of the
rewards is at least 1. What is the optimal policy.
(Hint: Show that there exists an optimal policy that first performs
a sequence of Roperations and only latter performs a sequence (possibly empty)
of D operations.)
Consider the following digit game. We have N+1 slots, marked by 0 to N. At each round (of N+1 rounds) a random digit is chosen. (I.e. we chose a random number from .) We can put the digit in one of the unoccupied slots, and that slot becomes occupied. Our aim is to maximize the expected value of the sequence of digits, when viewed as a number. (I.e. if we have di in slot i then the final value is .)
Example: Suppose N=2. At the beginning we have three empty slots. Lets denote an empty slot by , then we have . Suppose the first digit is 5, and we decide to put it in the second slot, then we have . Suppose the second digit is 4and we decide to put it in the first slot, we have . Finally, the last digit is 4 again, and our final number is 454.
The homework is due in two weeks