**We will go over some fundamental papers on the problem of online
learning with expert advice, both in the full information model and in the partial
information model (the partial information version is also known as the
multi-armed bandit problem). For basic information you can check the Wikipedia page of the
multi-armed bandit problem. The following books may also be useful for some
general information and additional material on some specific topics.**

**· ****Shay Shalev-Schwartz, Online learning and
online convex optimization, 2012**

**· ****Elad**** Hazan, Introduction
to convex optimization, 2016**

**· ****Nicolò**** Cesa-Bianchi and Sébastien Bubeck, Regret
Analysis of Stochastic and Nonstochastic Multi-Armed
Bandit Problems, 2012**

**· ****Nicolò**** Cesa-Bianchi and Gabor Lugosi, Prediction,
Learning, and Games, 2006**

**The content of the seminar may have some overlap with the course
0368-4472-01, Advanced Topics in Machine Learning-Algorithmic Game Theory, by
Prof. Yishay Mansour. **

**Schedule **

**1)
****(Oct
25) ****An introductory lecture **

**2)
****(Nov 1)
****Y. Freund and R. E. Schapire, A
decision-theoretic generalization of on-line learning and an application to
boosting, JCSS 55, 119 - 139 1997. ****Yogev**** Bar-On, slides**

**3)
**

**a)
****(Nov 8)
****P. Auer, N. Cesa-Bianchi, Y.
Freund, and R. E. Schapire, The nonstochastic multiarmed bandit
problem, SICOMP 32(1), 48 - 77, 2002, Sections 3+4 (Exp3). ****Barak Itkin, slides**

**b)
****(Nov
15) ****P. Auer, N. Cesa-Bianchi,
Y. Freund, and R. E. Schapire, The nonstochastic multiarmed bandit
problem, SICOMP 32(1), 48 - 77, 2002, Sections 5 (lower bound), Section 6
if there is time. ****Shahaf**** Nacson, slides**

**c)
****(Nov
22) ****P. Auer, N. Cesa-Bianchi,
Y. Freund, and R. E. Schapire, The nonstochastic multiarmed bandit
problem, SICOMP 32(1), 48 - 77, 2002, Sections 7+ (Exp4). ****Ilia Shevrin, slides**

**4)
****(Nov
29) ****P. Auer, N. Cesa-Bianchi,
P. Fischer, Finite-time
analysis of the multiarmed bandit problem, ML,
47, 235 - 256, 2002. ****Guy Inbar, slides**

**5)
****(Dec 6)
****S.
Agrawal and N. Goyal, Analysis
of Thompson sampling for the multi-armed bandit problem, JMLR 23 39.1 -
39.26, 2012. ****Yoav**** Chai****, slides**

**6)
****(Dec
13) ****A. Kalai, S. Vempala, Efficient
algorithms for online decision problems, JCSS 71(3), 291 - 307, 2005. ****Ran Hochshtatter****, slides**

**7)
****(Dec
20) ****S. Shalev-Shwartz, Y.
Singer, A
primal-dual perspective of online learning algorithms, ML 69 115–142, 2007.
****Moran Nechushtan, slides**

**8)
****(Dec
27)**** B. Awerbuch, R.
Kleinberg, Online
linear optimization and adaptive routing, JCSS 74 97 - 114, 2008. ****Yaniv**** Rubinpur,
slides **

**9)
****(Jan 3)
****M. Zinkevich, Online
Convex Programming and Generalized Infinitesimal Gradient Ascent, ICML
2003. ****Tzlil**** Gonen, notes**

**10)
**** ****(Jan
10) ****A. D. Flaxman, A. T. Kalai,
and H. B. McMahan, Online
convex optimization in the bandit setting: gradient descent without a gradient,
SODA 2005. ****Aviv Rosenberg, slides**

**11)
**** ****(Jan
17) ****E. Hazan and S. Kale,
Extracting
certainty from uncertainty: regret bounded by variation in costs, ML 80:
165–188, 2010. ****Dror**** Dayan**