\documentstyle[12pt]{book}
%\setlength{\oddsidemargin}{-.1 in}
%\setlength{\evensidemargin}{-.3 in}
\setlength{\evensidemargin}{.0 in}
\addtolength{\topmargin}{-1in}
\setlength{\textwidth}{6.5in}
\setlength{\textheight}{8.5in}

%
% The following macro is used to generate the header.
%
%

\newcommand{\handout}[5]{
   \pagestyle{headings}
   \thispagestyle{plain}
   \newpage
%   \setcounter{chapter}{#1}
   \setcounter{page}{#2}
%  \set\thechapter{#3}
   \noindent
   \begin{center}
   \framebox{
      \vbox{
    \hbox to 6.28in { {\bf Reinforcement Learning
                        \hfill Fall Semester, 1999/2000} }
       \vspace{4mm}
       \hbox to 6.28in { {\Large \hfill #1: #3  \hfill} }
       \vspace{2mm}
       \hbox to 6.28in { {\it Lecturer: #4 \hfill} }
      }
   }
   \end{center}
   \markboth{Handout #1: #3}{Handout #1: #3}
   \vspace*{4mm}
}

%
% Use these macros for organizing sections of your notes.
% Each command takes two arguments: (1) the title of the section and and
% (2) a keyword for that section to appear in the index.  (See examples.)
% Please don't use \section, \subsection, and \subsubsection directly!
%

\newcommand{\topic}[2]{\section{#1} \index{#2} \markright{#1}}
\newcommand{\subtopic}[2]{\subsection{#1} \index{#2}}
\newcommand{\subsubtopic}[2]{\subsubsection{#1} \index{#2}}
 
%
% Convention for citations is first author's last name followed by other
% authors' last initials, followed by the year.  For example, to cite the
% seventh entry in the course bibliography, you would type: \cite{BurnsL80}
% (To avoid bibliography problems, for now we redefine the \cite command.)
%
     
\renewcommand{\cite}[1]{[#1]}

%
% These are just to make things a little easier:
%
\newcommand{\bi}{\begin{itemize}}
\newcommand{\ei}{\end{itemize}}
\newcommand{\be}{\begin{enumerate}}
\newcommand{\ee}{\end{enumerate}}
\newcommand{\blank}{\vspace{1ex}}   % generates a blank line in the output

%
% Use these for theorems, lemmas, proofs, etc.
%
\newtheorem{theorem}{Theorem}[chapter]
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{corollary}[theorem]{Corollary}
\newenvironment{proof}{{\em Proof:}}{\hfill\rule{2mm}{2mm}}

%
% Use the following for definitions.
% \bigdef is for definitions to be set off by themselves; \smalldef is for
% definitions given in the middle of a paragraph.
%
\newenvironment{dfn}{{\vspace*{1ex} \noindent \bf Definition }}{\vspace*{1ex}}
\newcommand{\bigdef}[2]{\index{#1}\begin{dfn} {\rm #2} \end{dfn}}
\newcommand{\smalldef}[1]{\index{#1} {\em #1}}


\begin{document}
%\handout{**HANDOUT-NUMBER**}{**1ST-PAGE**}{**DATE**}{**LECTURER**}
\handout{Homework 3}{1}{Dec. 16}{Yishay Mansour}

\section*{Homework number 3.}

\noindent{\Large \bf Programming Assignment}

In this assignment you need to write a program that will learn to play the
card game 21.\\

\noindent{\bf Game description:}
There is a deck of cards, total 52 cards.
There are two players, the {\em house} and the
{\em gambler}.
The winner in the game is the player with the most number of points,
which are less than (or equal to) 21.

When counting the points, each number card ($2$ to $10$)
has a value which is his number. Each face card ($J$, $Q$ or $K$)
is 10 points. The value of an ace $A$ is $11$ points (simplifying the
rule of the real game where $A$ can be either $1$ or $11$.)

At the beginning each player gets two cards, one is faced up
(which you see) while the other is faced down (which you don't see).
%
The gambler look at all its cards (and the house open card)
and need to decide whether
to ask for a another card ({\tt hit}) or end ({\tt stop}).

We fix the strategy of the house as follows.
If the sum of the cards is 15 or less, the house perform {\tt hit},
and if the sum is 16 or more it performs {\tt stand}.

We model the game as an MDP whose states are labeled by the sum
of the card of the gambler.\\

\noindent{\bf Task 1:}
Implement TD(0) and use it to compute the probability that the
gambler wins, given that his policy is:
If the sum is 18 or more then {\tt stand} else {\tt hit}.\\

\noindent{\bf Task 2:}
Implement either Q-Learning or Sarsa, and compute an 
optimal policy against the house policy we consider.
(Give as an output of the optimal action in each state, and the
probability of winning from that state.)\\

\noindent{\bf Task 3:}
Try to improve the results by modifying the structure of the MDP.
(The strategy of the house remains unchanged.)\\


\noindent{\bf The homework is due in two weeks}

\end{document}