Introduction to Online Optimization/Learning (Fall 2022)

Schedule is subject to small adjustments.
Lecture notes will be posted before each lecture (and might be updated slightly soon after the lecture).

Date	Topics	Lecture Notes and Recommended Reading
08/25	Introduction (what is online learning and why it is useful) Connection to statistical learning, online-to-batch conversion Expert problem, classical Hedge algorithm, and lower bound	Lecture notes 1 Chapter 1 and 9 of Hazan's book Chapters 1 and 2 of Bubeck's lecture notes Chapter 3.7 of Cesa-Bianchi and Lugosi's book
09/01	Follow-the-Regularized-Leader (FTRL) and Online Gradient Descent (OGD) Combinatorial problems Follow-the-Perturbed-Leader (FTPL)	Lecture notes 2 Chapter 5 of Hazan's book Chapter 6 of Bubeck's lecture notes
09/08	Adaptive data-dependent regret bounds Optimistic FTRL, Optimistic Hedge (with corrected losses)	Lecture notes 3 Chapter 2.4 of Cesa-Bianchi and Lugosi's book Impossible Tuning Made Possible Chapter 9 of Orabona's book (a different kind of adaptivity)
09/15	Connection to game theory: minimax theorem, Nash equilibria coarse correlated equilibria, learning in games, fast convergence via adaptivity	Lecture notes 4 See Chapter 7.2 of Cesa-Bianchi and Lugosi's book for a general minimax theorem An excellent CMU course on computational game theory via online learning
09/22	Regret minimization in non-stationary environments Interval regret, switching/tracking regret, and dynamic regret Sleeping expert problem	Lecture notes 5 See Sec 1.3 of this paper (or the last part of this note) and Sec 2 of this paper for two different efficient implementations
09/29	Adversarial multi-armed bandits: Exp3, lower bounds, minimax algorithms, loss estimators and local norm Comparisons between full-information feedback and bandit feedback	Lecture notes 6 Chapters 5 and 7 of Bubeck's lecture notes Part III of Lattimore and Szepesvári's book
10/06	cancelled
10/13 (zoom link)	Stochastic multi-armed bandits: explore-then-exploit, UCB, LinUCB Optimism in face of uncertainty Best of both worlds	Lecture notes 7 Parts II and V of Lattimore and Szepesvári's book
10/20	Contextual bandits, realizable versus agnostic setting, Exp4 Oracle-efficient contextual bandits, SquareCB	Lecture notes 8 See the SquareCB paper for more details
10/27	Adversarial bandit linear/convex optimization Exp2 and SCRiBLe	Lecture notes 9 Part VI and Section 30 of Lattimore and Szepesvári's book See this note for omitted analysis for BCO
11/03	Online reinforcement learning, Markov Decision Processes Occupancy measure space, policy optimization methods	Lecture notes 10 See this paper for more details on handling unknown transitions and this one for more details on policy optimization methods
11/10	Student presentations
11/17	Student presentations
11/24	Thanksgiving
12/01	Student presentations