Introduction to Online Optimization/Learning

Summer 2018

Lecturer: Haipeng Luo When: See detailed schedule below Where: Teaching Building 3, 201

Overview: This course focuses on the foundation of the theory of online learning/online convex optimization/sequential decision making, which has been playing a crucial role in machine learning and many real-life applications. The main theme of the course is to study algorithms whose goal is to minimize "regret" when facing against a possibly adversarial environment with possibly limited (a.k.a. "bandit") feedback, and to understand their theoretical guarantees. Some connections to game theory, boosting and other learning problems will also be covered. This is a mini version of a graduate course taught at USC.

Learning Objectives: At a high-level, through this course you will have a concrete idea of what online learning is about, what the classic algorithms are, and how they are usually analyzed. Specifically, you will learn about algorithms such as exponential weights, follow-the-regularized-leader, UCB, EXP3, SCRiBLe, and others, as well as general techniques for proving regret upper and lower bounds. The broader goal is to cultivate the ability to think about machine learning in a more rigorous and principled way and to design provable and practical machine learning algorithms.

Prerequisites: Familiarity with probability, convex analysis, calculus, and analysis of algorithms. Some basic understandings of machine learning would be very helpful.

Readings: Their is no official textbook for this course, but the following books/surveys are very helpful in general:

Introduction to Online Convex Optimization by Elad Hazan
Introduction to Online Optimization by Sebastien Bubeck
Online Learning and Online Convex Optimization by Shai Shalev-Shwartz
Prediction, Learning, and Games by Nicolo Cesa-Bianchi and Gabor Lugosi

Schedule:

(the first half of the course focuses on problems with full information feedback, while the second half focuses on bandit feedback.)

Date Topics Recommended Reading

07/16
2:00-5:00 Introduction;
online Learning;
statistical learning and online-to-batch conversion;
the expert problem, Hedge, and lower bounds Chapter 1 and 9 of Hazan's survey;
Chapter 1 of Bubeck's lecture notes;
classic paper on the expert problem and Hedge

07/17
2:00-5:00 Follow-the-Regularized-Leader;
Online Gradient Descent;
Combinatorial problems Chapter 5.1-5.4 of Hazan's survey;
Chapter 5 and 6 of Bubeck's survey

07/18
2:00-5:00 Connection to game theory;
minimax theorem;
Connection to boosting;
AdaBoost See Chapter 7.2 of Cesa-Bianchi and Lugosi's book for
a general minimax theorem (with similar proof);

07/21
2:00-5:00 Multi-armed Bandits (MAB);
Exp3 algorithm;
lower bounds;
Stochastic MAB;
Explore-then-exploit;
UCB algorithm See Chapter 6.4-6.6 of Cesa-Bianchi and Lugosi's book
for general partial information problems;
See Sec 2.3 of this survey for a lower bound on stochastic MAB

07/22
2:00-5:00 Linear Bandit;
Exp2 algorithm;
Combinatorial bandits;
SCRiBLe See Sec 5 of this paper for more examples of combinatorial bandits;
See the original SCRiBLe paper for efficient implementation and
discussions on the online-shortest-path problem

07/23
9:00-12:00 Bandit Convex Optimization;
Gradient descent without gradients See this paper for an L2 ball sampling scheme with gradient descent