| Date | Topics | Lecture Notes and Recommended Reading |
| 01/12 | Introduction (what is online learning and why it is useful) Connection to statistical learning, online-to-batch conversion Expert problem, classical Hedge algorithm, and lower bound | Chapter 1 and 9 of Hazan's book Chapters 1 and 2 of Bubeck's lecture notes Chapter 3.7 of Cesa-Bianchi and Lugosi's book |
| 01/19 | MLK Holiday | |
| 01/26 | Follow-the-Regularized-Leader (FTRL) and Online Gradient Descent (OGD) Combinatorial problems Follow-the-Perturbed-Leader (FTPL) | Chapter 5 of Hazan's book Chapter 6 of Bubeck's lecture notes |
| 02/02 | Adaptive data-dependent regret bounds Optimistic FTRL, Optimistic Hedge (with corrected losses) | Chapter 2.4 of Cesa-Bianchi and Lugosi's book Impossible Tuning Made Possible Chapter 9 of Orabona's book (a different kind of adaptivity) |
| 02/09 | Connection to game theory: minimax theorem, Nash equilibria coarse correlated equilibria, learning in games, fast convergence via adaptivity | See Chapter 7.2 of Cesa-Bianchi and Lugosi's book for a general minimax theorem An excellent CMU course on computational game theory via online learning |
| 02/16 | President’s Day Holiday | |
| 02/23 | Regret minimization in non-stationary environments Interval regret, switching/tracking regret, and dynamic regret Sleeping expert problem | See Sec 1.3 of this paper (or the last part of this note) and Sec 2 of this paper for two different efficient implementations |
| 03/02 | Adversarial multi-armed bandits: Exp3, lower bounds, minimax algorithms, loss estimators and local norm Comparisons between full-information feedback and bandit feedback | Chapters 5 and 7 of Bubeck's lecture notes Part III of Lattimore and Szepesvári's book |
| 03/09 | Stochastic multi-armed bandits: explore-then-exploit, UCB, LinUCB Optimism in face of uncertainty Best of both worlds | Parts II and V of Lattimore and Szepesvári's book |
| 03/16 | Spring Recess | |
| 03/23 | Contextual bandits, realizable versus agnostic setting, Exp4 Oracle-efficient contextual bandits, SquareCB | See the SquareCB paper for more details |
| 03/30 | Adversarial bandit linear/convex optimization Exp2 and SCRiBLe | Part VI and Section 30 of Lattimore and Szepesvári's book See this note for omitted analysis for BCO |
| 04/06 | Online reinforcement learning, Markov Decision Processes Occupancy measure space, policy optimization methods | See this paper for more details on handling unknown transitions and this one for more details on policy optimization methods |
| 04/13 | Student presentations | |
| 04/20 | Student presentations | |
| 04/27 | Student presentations |
|