Next: Optimality Proof
Up: Example: The Recruiting Problem
Previous: The Goal
The Solution
We will first construct a corresponsing MDP for the problem, and then develop the
optimal policy for it.
Let
the possible actions, where Continue stands for 'continue' and
QuitAndHire for 'quit and recruit'.
Let
be the group of states of the MDP, standing for:
 MaxSoFar  the last interviewed candidate was the best so far
 Other  the last interviewed candidate was NOT the best so far
 1  the last interviewed candidate was THE best candidate in the entire group and was hired
 0  the last interviewed candidate was NOT the best candidate in the entire group and was hired
Figure:
The recruiting problem MDP

Figure shows the resultant MDP, using the following transition probabilities:
where N is the finite time horizon (the size of the candidates group) and t is the current time
(the number of candidates interviewed so far).
Writing the optimality equations for this problem we get:
Assigning the terms for q_{t} and r_{t} we get:

(4.1) 

(4.2) 
An optimal decision rule has the following properties:
 In s=Other, always choose
a_{s}=Continue (otherwise the return is promised to be zero)
 In
s=MaxSoFar, choose
a_{s}=Continue if
and
a_{s}=QuitAndHire otherwise
(in order to maximize
U_{t}^{*}(MaxSoFar))
We now show that the optimal policy is of the form:
 Interview
candidates, performing
 Quit and hire the first candidate after time ,
which is the best so far
Next: Optimality Proof
Up: Example: The Recruiting Problem
Previous: The Goal
Yishay Mansour
19991118