Introduction to Online Learning

CSCI 699, Fall 2017

Haipeng Luo

When: TuTh 2:00-3:50 Where: SGM 601 Office Hours: By appointment TA: Chen-Yu Wei (chenyu dot wei at usc dot edu)

Overview: This course focuses on the foundation and advances of the theory of online learning/online convex optimization/sequential decision making, which has been playing a crucial role in machine learning and many real-life applications. The main theme of the course is to study algorithms whose goal is to minimize "regret" when facing against a possibly adversarial environment, and to understand their theoretical guarantees. Special attention will be paid to more adaptive, efficient and practical algorithms. Some connections to game theory, boosting and other learning problems will also be covered.

Learning Objectives: At a high-level, through this course you will have a concrete idea of what online learning is about, what the state-of-the-art is, and what the open problems are. Specifically, you will learn about classic algorithms such as exponential weights, online mirror descent, UCB, EXP3 and more recent advanced algorithms, as well as general techniques for proving regret upper and lower bounds. The hope is that after this course you will think about machine learning in a more rigorous and principled way and have the ability to design provable and practical machine learning algorithms.

Requirements:

4 problem sets, each of which consists of several theory questions on algorithm design and analysis. Collaboration is allowed but must be stated. Grades are based on correctness. Must be written in Latex. 40% of course grade.
A final project. 50% of course grade.
Participation. Include regular attendance and a 50-min presentation of a paper. 10% of course grade.

Late homework policy: You are given 4 late days for the problem sets (no late days for the final project), to be used in integer amounts and distributed as you see fit. Additional late days will each result in a deduction of 10% of the grade of the corresponding assignment.

Prerequisites: Familiarity with probability, convex analysis, calculus, and analysis of algorithms. Some basic understandings of machine learning would be very helpful.

Readings: There is no official textbook for this course, but the following books/surveys are very helpful in general:

Introduction to Online Convex Optimization by Elad Hazan
Introduction to Online Optimization by Sebastien Bubeck
Online Learning and Online Convex Optimization by Shai Shalev-Shwartz
Prediction, Learning, and Games by Nicolo Cesa-Bianchi and Gabor Lugosi

Schedule:

Date Topics Recommended Reading Homework

08/22 Introduction;
online Learning;
statistical learning theory;
online-to-batch conversion Lecture notes 1;
Chapter 1 and 9 of Hazan's survey;
Chapter 1 of Bubeck's lecture notes;

08/24 the expert problem and Hedge;
Lower bounds;
Follow the Regularized Leader Lecture notes 2;
classic paper on the expert problem and Hedge;
Chapter 3.7 of Cesa-Bianchi and Lugosi's book;
Chapter 5.1-5.4 of Hazan's survey

08/29 Online Gradient Descent;
Follow the Perturbed Leader;
Combinatorial problems Lecture notes 3;
Chapter 5.5 of Hazan's survey;
Chapter 6 of Bubeck's survey

08/31 Adaptive regret bounds;
"small-loss" bounds;
quantile bounds Lecture notes 4;
Chapter 2.4 of Cesa-Bianchi and Lugosi's book
(a different learning rate schedule for small-loss)

09/05 Second order bounds;
Squint algorithm Lecture notes 5;
The Squint paper by Koolen and Van Erven

09/07 Variation bounds;
Optimistic FTRL; Lecture notes 6;
A different proof for variation bounds;
proof of Optimistic FTRL is from this paper Homework1

09/12 Connection to game theory;
minimax theorem;
fast convergence via adaptivity Lecture notes 7;
See Chapter 7.2 of Cesa-Bianchi and Lugosi's book for
a general minimax theorem (with similar proof)

09/14 Connection to boosting;
AdaBoost;
margin theory;
uniform margin bounds via adaptivity Lecture notes 8;
Schapire's slides: toy example of AdaBoost;
resistance to overfitting; the margin "movie"

09/19 Non-stationary environments;
interval regrets;
sleeping experts Lecture notes 9;
See Sec 2 of this paper for a different efficient implementation.

09/21 switching/tracking regret;
dynamic regret Lecture notes 10;
Homework1 due

09/26 Fixed-share algorithm Lecture notes 11;

09/28 Multi-armed Bandits (MAB);
Exp3 algorithm;
lower bounds Lecture notes 12;
See Chapter 6.4-6.6 of Cesa-Bianchi and Lugosi's book
for general partial information problems Homework2

10/03 Optimal MAB algorithms;
FTRL/OMD with Tasllis entropy;
high probability bounds Lecture notes 13;
See Lemma 1 of this paper for the proof of the
high probability lemma

10/05 Stochastic MAB;
Explore-then-exploit;
UCB algorithm;
optimism in face of uncertainty Lecture notes 14;
See Sec 2.3 of this survey for a lower bound on stochastic MAB

10/10 Stochastic linear bandits;
LinUCB Lecture notes 15;
See Theorem 2 of this paper for the proof of confidence ellipsoid

10/12 Adversarial Linear Bandit;
Exp2 algorithm;
Combinatorial bandits Lecture notes 16;
See Sec 5 of this paper for more examples of combinatorial bandits Homework2 due

10/17 FTRL for linear bandit;
SCRiBLe Lecture notes 17;
See the original paper for efficient implementation and
discussions on the online-shortest-path problem

10/19 Bandit Convex Optimization Lecture notes 18;
See this paper for an L2 ball sampling scheme with gradient descent Homework3

10/24 Contextual bandit;
Exp4 algorithm;
Oracle-efficient Algorithms Lecture notes 19;
See this paper for the impossibility of oracle-efficiency in general

10/26 Epsilon-Greedy;
policy elimination Lecture notes 20

10/31 Optimal and oracle-efficient
algorithm: "minimonster" Lecture notes 21;
See the original paper for the very efficient implementation Project proposal due

11/02 Contextual bandits with
Adversarial Loss;
Relaxation-based approach Lecture notes 22;
See this paper for an improved algorithm

11/07 Students' presentations Universal Portfolios With and Without Transaction Costs
presented by Mehdi Jafarnia Jahromi
Logarithmic Regret Algorithms for Online Convex Optimization (**Sec 1-3.2**)
presented by Daoud Burghal

11/09 Students' presentations Optimal Strategies and Minimax Lower Bounds for Online Convex Games
presented by Guangyu Li
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
presented by Liyu Chen Homework3 due

11/14 No class ML Seminar by Rob Schapire on contextual bandits (3:30pm, SAL 101)

11/16 Students' presentations Online Optimization : Competing with Dynamic Comparators
presented by Jason Gregory
Projection-free Online Learning
presented by Zhiyun Lu Homework4

11/21 Students' presentations Regret Bounds for Sleeping Experts and Bandits
presented by He Jiang
Best Arm Identification in Multi-Armed Bandits
presented by Anastasia Voloshinov

11/23 Thanksgiving

11/28 Students' presentations One Practical Algorithm for Both Stochastic and Adversarial Bandits
presented by Michael Conway
Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
presented by Kien Nguyen

11/30 Students' presentations Online Learning with Switching Costs and Other Adaptive Adversaries
presented by Ke Zhang
Better Rates for Any Adversarial Deterministic MDP
presented by Karishma Sharma Homework4 due