next up previous
Next: MDP description Up: TD-Gammon Previous: MDPs with a very

   
State encoding TD-Gammon

TD-Gammon uses a neural network with 198 inputs. For each position and for each color there are four inputs:
1.
equals true if there is at least one piece present.
2.
equals true if there are at least two pieces present.
3.
equals true if there are at least three pieces present.
4.
has a value of $\frac {n - 3}{2}$ if there are at least four pieces present.
If no piece is present then all four inputs are false.
Two additional inputs encode the number of pieces that were "taken" for each color. Each one has a value of $\frac{n}{2}$ where n is the number of eaten pieces. Two other inputs encode the number of pieces removed. Each one has a value of $\frac{n}{15}$ where n is the number of pieces removed. Two last boolean inputs encode for each player whether it is his turn currently.

Yishay Mansour
2000-01-17