Next: Optimality Proof Up: Example: The Recruiting Problem Previous: The Goal

### The Solution

We will first construct a corresponsing MDP for the problem, and then develop the optimal policy for it.
Let the possible actions, where Continue stands for 'continue' and QuitAndHire for 'quit and recruit'. Let be the group of states of the MDP, standing for:
• MaxSoFar - the last interviewed candidate was the best so far
• Other - the last interviewed candidate was NOT the best so far
• 1 - the last interviewed candidate was THE best candidate in the entire group and was hired
• 0 - the last interviewed candidate was NOT the best candidate in the entire group and was hired

Figure shows the resultant MDP, using the following transition probabilities:
where N is the finite time horizon (the size of the candidates group) and t is the current time (the number of candidates interviewed so far).
Writing the optimality equations for this problem we get:

Assigning the terms for qt and rt we get:

 (4.1)

 (4.2)

An optimal decision rule has the following properties:
• In s=Other, always choose as=Continue (otherwise the return is promised to be zero)
• In s=MaxSoFar, choose as=Continue if and as=QuitAndHire otherwise (in order to maximize Ut*(MaxSoFar))
We now show that the optimal policy is of the form:
• Interview candidates, performing
• Quit and hire the first candidate after time , which is the best so far

Next: Optimality Proof Up: Example: The Recruiting Problem Previous: The Goal
Yishay Mansour
1999-11-18