Date | Topics | Lecture Notes and Recommended Reading |
08/25 | Introduction (what is online learning and why it is useful) Connection to statistical learning, online-to-batch conversion Expert problem, classical Hedge algorithm, and lower bound | Lecture notes 1 Chapter 1 and 9 of Hazan's book Chapters 1 and 2 of Bubeck's lecture notes Chapter 3.7 of Cesa-Bianchi and Lugosi's book |
09/01 | Follow-the-Regularized-Leader (FTRL) and Online Gradient Descent (OGD) Combinatorial problems Follow-the-Perturbed-Leader (FTPL) | Lecture notes 2 Chapter 5 of Hazan's book Chapter 6 of Bubeck's lecture notes |
09/08 | Adaptive data-dependent regret bounds Optimistic FTRL, Optimistic Hedge (with corrected losses) | Lecture notes 3 Chapter 2.4 of Cesa-Bianchi and Lugosi's book Impossible Tuning Made Possible Chapter 9 of Orabona's book (a different kind of adaptivity) |
09/15 | Connection to game theory: minimax theorem, Nash equilibria coarse correlated equilibria, learning in games, fast convergence via adaptivity | Lecture notes 4 See Chapter 7.2 of Cesa-Bianchi and Lugosi's book for a general minimax theorem An excellent CMU course on computational game theory via online learning |
09/22 | Regret minimization in non-stationary environments Interval regret, switching/tracking regret, and dynamic regret Sleeping expert problem | Lecture notes 5 See Sec 1.3 of this paper (or the last part of this note) and Sec 2 of this paper for two different efficient implementations |
09/29 | Adversarial multi-armed bandits: Exp3, lower bounds, minimax algorithms, loss estimators and local norm Comparisons between full-information feedback and bandit feedback | Lecture notes 6 Chapters 5 and 7 of Bubeck's lecture notes Part III of Lattimore and Szepesvári's book |
10/06 | cancelled | |
10/13 (zoom link) | Stochastic multi-armed bandits: explore-then-exploit, UCB, LinUCB Optimism in face of uncertainty Best of both worlds | Lecture notes 7 Parts II and V of Lattimore and Szepesvári's book |
10/20 | Contextual bandits, realizable versus agnostic setting, Exp4 Oracle-efficient contextual bandits, SquareCB | Lecture notes 8 See the SquareCB paper for more details |
10/27 | Adversarial bandit linear/convex optimization Exp2 and SCRiBLe | Lecture notes 9 Part VI and Section 30 of Lattimore and Szepesvári's book See this note for omitted analysis for BCO |
11/03 | Online reinforcement learning, Markov Decision Processes Occupancy measure space, policy optimization methods | Lecture notes 10 See this paper for more details on handling unknown transitions and this one for more details on policy optimization methods |
11/10 | Student presentations | |
11/17 | Student presentations | |
11/24 | Thanksgiving | |
12/01 | Student presentations |
|