next up previous
Next: Example: Up: Calculating the Return Value Previous: Calculating the Return Value

Existence of a unique solution

We define a linear transformation Ld: $L_d\vec{v}=\vec{r}_d+\lambda
Since $\vec{v}_{\lambda}^{\pi}=L_d\vec{v}_\lambda^{\pi}$, $\vec{v}_\lambda^\pi$ is a fixed point of Ld.

Theorem 5.2   For $0 \leq\lambda<1$ and $\pi$ a Markovian Stationary policy,
$\vec{v}_\lambda^\pi$ is the unique solution for the equation set

\begin{eqnarray*}\vec{v}=\vec{r}_d+\lambda p_d\vec{v}

and is equal to

\begin{eqnarray*}\vec{v}_{\lambda}^{\pi}=(I-\lambda P_d)^{-1}\vec{r}_d

Proof:We can write the equation set as

\begin{eqnarray*}\vec{v}(I-\lambda P_d) = \vec{r}_d

Since Pd is a probability matrix, $\Vert P_d\Vert=1$, and as $\lambda
< 1$, $\Vert\lambda P_d\Vert < 1$.

According to Theorem [*], $(I-\lambda P_d)^{-1}$ exists. Thus, a solution $\vec{v}=(I-\lambda P_d)^{-1}\vec{r}_d$ exists.

By the same theorem,

\begin{eqnarray*}\vec{v}=(I-\lambda P_d)^{-1}\vec{r}_d=\sum_{i=0}^{\infty}(\lamb...

We have shown that the solution is the discounted return value of policy $\pi$ $\Box$

Figure: Example Diagram

Yishay Mansour