Introduction to Online Optimization/Learning

Summer 2018


Lecturer: Haipeng Luo
When: See detailed schedule below
Where: Teaching Building 3, 201

Overview: This course focuses on the foundation of the theory of online learning/online convex optimization/sequential decision making, which has been playing a crucial role in machine learning and many real-life applications. The main theme of the course is to study algorithms whose goal is to minimize "regret" when facing against a possibly adversarial environment with possibly limited (a.k.a. "bandit") feedback, and to understand their theoretical guarantees. Some connections to game theory, boosting and other learning problems will also be covered. This is a mini version of a graduate course taught at USC.

Learning Objectives: At a high-level, through this course you will have a concrete idea of what online learning is about, what the classic algorithms are, and how they are usually analyzed. Specifically, you will learn about algorithms such as exponential weights, follow-the-regularized-leader, UCB, EXP3, SCRiBLe, and others, as well as general techniques for proving regret upper and lower bounds. The broader goal is to cultivate the ability to think about machine learning in a more rigorous and principled way and to design provable and practical machine learning algorithms.

Prerequisites: Familiarity with probability, convex analysis, calculus, and analysis of algorithms. Some basic understandings of machine learning would be very helpful.

Readings: Their is no official textbook for this course, but the following books/surveys are very helpful in general:

Schedule:

(the first half of the course focuses on problems with full information feedback, while the second half focuses on bandit feedback.)

Date Topics Recommended Reading
07/16
2:00-5:00
Introduction;
online Learning;
statistical learning and online-to-batch conversion;
the expert problem, Hedge, and lower bounds
Chapter 1 and 9 of Hazan's survey;
Chapter 1 of Bubeck's lecture notes;
classic paper on the expert problem and Hedge
07/17
2:00-5:00
Follow-the-Regularized-Leader;
Online Gradient Descent;
Combinatorial problems
Chapter 5.1-5.4 of Hazan's survey;
Chapter 5 and 6 of Bubeck's survey
07/18
2:00-5:00
Connection to game theory;
minimax theorem;
Connection to boosting;
AdaBoost
See Chapter 7.2 of Cesa-Bianchi and Lugosi's book for
a general minimax theorem (with similar proof);
07/21
2:00-5:00
Multi-armed Bandits (MAB);
Exp3 algorithm;
lower bounds;
Stochastic MAB;
Explore-then-exploit;
UCB algorithm
See Chapter 6.4-6.6 of Cesa-Bianchi and Lugosi's book
for general partial information problems;
See Sec 2.3 of this survey for a lower bound on stochastic MAB
07/22
2:00-5:00
Linear Bandit;
Exp2 algorithm;
Combinatorial bandits;
SCRiBLe
See Sec 5 of this paper for more examples of combinatorial bandits;
See the original SCRiBLe paper for efficient implementation and
discussions on the online-shortest-path problem
07/23
9:00-12:00
Bandit Convex Optimization;
Gradient descent without gradients
See this paper for an L2 ball sampling scheme with gradient descent