Introduction to Online Optimization/Learning (Fall 2022)

  • Schedule is subject to small adjustments.

  • Lecture notes will be posted before each lecture (and might be updated slightly soon after the lecture).

Date Topics Lecture Notes and Recommended Reading
08/25 Introduction (what is online learning and why it is useful)
Connection to statistical learning, online-to-batch conversion
Expert problem, classical Hedge algorithm, and lower bound
Lecture notes 1
Chapter 1 and 9 of Hazan's book
Chapters 1 and 2 of Bubeck's lecture notes
Chapter 3.7 of Cesa-Bianchi and Lugosi's book
09/01 Follow-the-Regularized-Leader (FTRL) and Online Gradient Descent (OGD)
Combinatorial problems
Follow-the-Perturbed-Leader (FTPL)
Lecture notes 2
Chapter 5 of Hazan's book
Chapter 6 of Bubeck's lecture notes
09/08 Adaptive data-dependent regret bounds
Optimistic FTRL, Optimistic Hedge (with corrected losses)
Lecture notes 3
Chapter 2.4 of Cesa-Bianchi and Lugosi's book
Impossible Tuning Made Possible
Chapter 9 of Orabona's book (a different kind of adaptivity)
09/15 Connection to game theory: minimax theorem, Nash equilibria
coarse correlated equilibria, learning in games, fast convergence via adaptivity
Lecture notes 4
See Chapter 7.2 of Cesa-Bianchi and Lugosi's book for a general minimax theorem
An excellent CMU course on computational game theory via online learning
09/22 Regret minimization in non-stationary environments
Interval regret, switching/tracking regret, and dynamic regret
Sleeping expert problem
Lecture notes 5
See Sec 1.3 of this paper (or the last part of this note) and Sec 2 of this paper
for two different efficient implementations
09/29 Adversarial multi-armed bandits: Exp3, lower bounds, minimax algorithms,
loss estimators and local norm
Comparisons between full-information feedback and bandit feedback
Lecture notes 6
Chapters 5 and 7 of Bubeck's lecture notes
Part III of Lattimore and Szepesvári's book
10/06 cancelled
10/13
(zoom link)
Stochastic multi-armed bandits: explore-then-exploit, UCB, LinUCB
Optimism in face of uncertainty
Best of both worlds
Lecture notes 7
Parts II and V of Lattimore and Szepesvári's book
10/20 Contextual bandits, realizable versus agnostic setting, Exp4
Oracle-efficient contextual bandits, SquareCB
Lecture notes 8
See the SquareCB paper for more details
10/27 Adversarial bandit linear/convex optimization
Exp2 and SCRiBLe
Lecture notes 9
Part VI and Section 30 of Lattimore and Szepesvári's book
See this note for omitted analysis for BCO
11/03 Online reinforcement learning, Markov Decision Processes
Occupancy measure space, policy optimization methods
Lecture notes 10
See this paper for more details on handling unknown transitions and
this one for more details on policy optimization methods
11/10 Student presentations
11/17 Student presentations
11/24 Thanksgiving
12/01 Student presentations