Workshop in Reinforcement Learning

(0368-3500-07)


 

Lecturer:

Prof. Yishay Mansour

 

Sunday  16:00-18:00

 

Tel:

8829

E-mail:

mansour@post.tau.ac.il

 

 

T.A.: 

Elad Verbin

E-mail:

eladv@post.tau.ac.il

 

 

 

 

Workshop project:

  1. The project will be done in groups of 2-3 students.
  2. Each group will implement a learning algorithm for a board game.
  3. The background material needed would be covered during the lectures.
  4. Requirements document

 

                                                 

Suggested Projects

More Challenging Projects

 

 

Workshop Outline

Week 1: Min Max Trees

Week 2: Introduction to Reinforcement Learning: Model and Planning.

Week 3: Reinforcement learning: Learning (small state space)

Week 4: Reinforcement learning: Learning (large state space)

Week 5: Simple Graphics (GUI)

 

Teams and Games

  1. Nira Amit and Assaf Shtilman
  2. Aner Mazursky and Amit Ben-David
  3. Chen Frenkel and Ron Frenkel
  4. Ori Lahav and Ariel Lvantel
  5. Shelly Machleb and Benny Davidovich
  6. Michal Samuel and Yuval Kalev

(email Elad with any spelling mistakes)

Sample Code

 

Basic Tic Toe implemented in C++.

Basic Tic Toe implemented in Java.

 

 

Bibliography [for background]

  1. A.G. Barto and R.S., Reinforcement Learning, MIT Press, 1998.
  2. Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neural Dynamic Programming. Athena Scientific, Belmont, MA.
  3. Gardner (1981). Samuel's checkers player. In Barr, A. and Feigenbaum, E. A., editors, The Handbook of Artificial Intelligence, I, pages 84--108. William Kaufmann, Los Altos, CA.
  4. Samuel, A. L. (1967). Some studies in machine learning using the game of checkers. II---Recent progress. IBM Journal on Research and Development, pages 601--617.
  5. Tesauro, G. J. (1994). TD--gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215--219.
  6. Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58--68.
  7. Tsitsiklis, J. N. and Van Roy, B. (1996). Feature-based methods for large scale dynamic programming. Machine Learning, 22:59--94.

Previous Workshops: 1 2 3