Seminar on Experts and Bandits, Fall 17/18

Haim Kaplan

We will go over some fundamental papers on the problem of online learning with expert advice, both in the full information model and in the partial information model (the partial information version is also known as the multi-armed bandit problem). For basic information you can check the Wikipedia page of the multi-armed bandit problem. The following books may also be useful for some general information and additional material on some specific topics.

·       Shay Shalev-Schwartz, Online learning and online convex optimization, 2012

·       Elad Hazan, Introduction to convex optimization, 2016

·       Nicolò Cesa-Bianchi and Sébastien Bubeck, Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems, 2012

·       Nicolò Cesa-Bianchi and Gabor Lugosi, Prediction, Learning, and Games, 2006

The content of the seminar may have some overlap with the course 0368-4472-01, Advanced Topics in Machine Learning-Algorithmic Game Theory, by Prof. Yishay Mansour.

Schedule

1)    (Oct 25) An introductory lecture

2)    (Nov 1) Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, JCSS 55, 119 - 139 1997. Yogev Bar-On, slides

3)     

a)    (Nov 8) P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, The nonstochastic multiarmed bandit problem, SICOMP 32(1), 48 - 77, 2002, Sections 3+4 (Exp3). Barak Itkin, slides

b)    (Nov 15) P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, The nonstochastic multiarmed bandit problem, SICOMP 32(1), 48 - 77, 2002, Sections 5 (lower bound), Section 6 if there is time. Shahaf Nacson, slides

c)     (Nov 22) P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, The nonstochastic multiarmed bandit problem, SICOMP 32(1), 48 - 77, 2002, Sections 7+ (Exp4). Ilia Shevrin, slides

4)    (Nov 29) P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, ML, 47, 235 - 256, 2002. Guy Inbar, slides

5)    (Dec 6)  S. Agrawal and N. Goyal, Analysis of Thompson sampling for the multi-armed bandit problem, JMLR 23 39.1 - 39.26, 2012. Yoav Chai, slides

6)    (Dec 13) A. Kalai, S. Vempala, Efficient algorithms for online decision problems, JCSS 71(3), 291 - 307, 2005. Ran Hochshtatter, slides

7)    (Dec 20) S. Shalev-Shwartz, Y. Singer, A primal-dual perspective of online learning algorithms, ML 69 115–142, 2007. Moran Nechushtan, slides

8)    (Dec 27) B. Awerbuch, R. Kleinberg, Online linear optimization and adaptive routing, JCSS 74 97 - 114, 2008. Yaniv Rubinpur, slides

9)    (Jan 3) M. Zinkevich, Online Convex Programming and Generalized Infinitesimal Gradient Ascent, ICML 2003. Tzlil Gonen, notes

10)  (Jan 10) A. D. Flaxman, A. T. Kalai, and H. B. McMahan, Online convex optimization in the bandit setting: gradient descent without a gradient, SODA 2005. Aviv Rosenberg, slides

11)  (Jan 17) E. Hazan and S. Kale, Extracting certainty from uncertainty: regret bounded by variation in costs, ML 80: 165–188, 2010. Dror Dayan