We will go over some fundamental papers on the problem of online learning with expert advice, both in the full information model and in the partial information model (the partial information version is also known as the multi-armed bandit problem). For basic information you can check the Wikipedia page of the multi-armed bandit problem. The following books may also be useful for some general information and additional material on some specific topics.
· Shay Shalev-Schwartz, Online learning and online convex optimization, 2012
· Elad Hazan, Introduction to convex optimization, 2016
· Nicolò Cesa-Bianchi and Sébastien Bubeck, Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems, 2012
· Nicolò Cesa-Bianchi and Gabor Lugosi, Prediction, Learning, and Games, 2006
The content of the seminar may have some overlap with the course 0368-4472-01, Advanced Topics in Machine Learning-Algorithmic Game Theory, by Prof. Yishay Mansour.
1) (Oct 25) An introductory lecture
2) (Nov 1) Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, JCSS 55, 119 - 139 1997. Yogev Bar-On, slides
b) (Nov 15) P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, The nonstochastic multiarmed bandit problem, SICOMP 32(1), 48 - 77, 2002, Sections 5 (lower bound), Section 6 if there is time. Shahaf Nacson, slides
4) (Nov 29) P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, ML, 47, 235 - 256, 2002. Guy Inbar, slides
5) (Dec 6) S. Agrawal and N. Goyal, Analysis of Thompson sampling for the multi-armed bandit problem, JMLR 23 39.1 - 39.26, 2012. Yoav Chai, slides
7) (Dec 20) S. Shalev-Shwartz, Y. Singer, A primal-dual perspective of online learning algorithms, ML 69 115–142, 2007. Moran Nechushtan, slides
9) (Jan 3) M. Zinkevich, Online Convex Programming and Generalized Infinitesimal Gradient Ascent, ICML 2003. Tzlil Gonen, notes
10) (Jan 10) A. D. Flaxman, A. T. Kalai, and H. B. McMahan, Online convex optimization in the bandit setting: gradient descent without a gradient, SODA 2005. Aviv Rosenberg, slides
11) (Jan 17) E. Hazan and S. Kale, Extracting certainty from uncertainty: regret bounded by variation in costs, ML 80: 165–188, 2010. Dror Dayan