Introduction to Online Optimization/Learning (Spring 2026)

  • Schedule is subject to small adjustments.

  • Lecture notes will be posted before each lecture (and might be updated slightly soon after the lecture).

Date Topics Lecture Notes and Recommended Reading
01/12 Introduction (what is online learning and why it is useful)
Connection to statistical learning, online-to-batch conversion
Expert problem, classical Hedge algorithm, and lower bound
Chapter 1 and 9 of Hazan's book
Chapters 1 and 2 of Bubeck's lecture notes
Chapter 3.7 of Cesa-Bianchi and Lugosi's book
01/19 MLK Holiday
01/26 Follow-the-Regularized-Leader (FTRL) and Online Gradient Descent (OGD)
Combinatorial problems
Follow-the-Perturbed-Leader (FTPL)
Chapter 5 of Hazan's book
Chapter 6 of Bubeck's lecture notes
02/02 Adaptive data-dependent regret bounds
Optimistic FTRL, Optimistic Hedge (with corrected losses)
Chapter 2.4 of Cesa-Bianchi and Lugosi's book
Impossible Tuning Made Possible
Chapter 9 of Orabona's book (a different kind of adaptivity)
02/09 Connection to game theory: minimax theorem, Nash equilibria
coarse correlated equilibria, learning in games, fast convergence via adaptivity
See Chapter 7.2 of Cesa-Bianchi and Lugosi's book for a general minimax theorem
An excellent CMU course on computational game theory via online learning
02/16 President’s Day Holiday
02/23 Regret minimization in non-stationary environments
Interval regret, switching/tracking regret, and dynamic regret
Sleeping expert problem
See Sec 1.3 of this paper (or the last part of this note) and Sec 2 of this paper
for two different efficient implementations
03/02 Adversarial multi-armed bandits: Exp3, lower bounds, minimax algorithms,
loss estimators and local norm
Comparisons between full-information feedback and bandit feedback
Chapters 5 and 7 of Bubeck's lecture notes
Part III of Lattimore and Szepesvári's book
03/09 Stochastic multi-armed bandits: explore-then-exploit, UCB, LinUCB
Optimism in face of uncertainty
Best of both worlds
Parts II and V of Lattimore and Szepesvári's book
03/16 Spring Recess
03/23 Contextual bandits, realizable versus agnostic setting, Exp4
Oracle-efficient contextual bandits, SquareCB
See the SquareCB paper for more details
03/30 Adversarial bandit linear/convex optimization
Exp2 and SCRiBLe
Part VI and Section 30 of Lattimore and Szepesvári's book
See this note for omitted analysis for BCO
04/06 Online reinforcement learning, Markov Decision Processes
Occupancy measure space, policy optimization methods
See this paper for more details on handling unknown transitions and
this one for more details on policy optimization methods
04/13 Student presentations
04/20 Student presentations
04/27 Student presentations