Introduction to Online Learning

CSCI 699, Fall 2017

Haipeng Luo

Final Project

The final project is open-ended and can be one of the following:

Literature Survey. Conduct a literature review of research on a particular topic in online learning. Provide a detailed and clear overview and comparison of existing results along with some key algorithmic and analysis techniques. Also identify new research directions.
Empirical Evaluations. Compare the performance (error, time, etc) of different existing online learning algorithms for some problem on some datasets. Explore the effect of different tuning of parameters. Try to modify existing algorithms in a reasonable way and see how they perform.
New Theoretical Research. Pick an open problem in online learning and try to make progress on it. Complete solutions are not required or expected. Try to break down or simplify the problem and make progress.

Some concrete suggested topics will be given but original ideas are strongly welcomed. You may work alone or in a group of two, but in the latter case higher quality is expected. Grades are based on novelty, difficulty and completeness.

The project includes two submissions (both must be written in Latex):

proposal (10%) -- 1 page, due 10/31
final report (90%) -- up to 8 pages, due 12/10

Suggested Topics

Literature Survey:

(papers listed below are only some suggestions and you are recommended to read more according the references therein)

Unconstrained Online learning: all problems we have discussed restrict the learner to behave from a bounded action set (such as the simplex or a unit ball), but there is a line of work on removing this restriction at all, with the goal of developing parameter-free algorithms. Examples include:

FTPL Family: In lecture 3 we talked about FTPL for the expert problem with a simple uniform perturbations. However, there are in fact many different kinds of perturbations with different properties for different problems. Examples include:

Online Submodular Optimization: Submodular optimization (both minimization and maximization) has numerous applications in practice. The online versions of these problems have also been studied. Examples include:

Matrix Exponential Weights: We have seen the fundamental role of Hedge/exponential weights in online learning. There is actually a matrix version of exponential weights, which also has numerous applications. Examples include:

UCB Family: There is a huge literature on improving the classic UCB algorithms, either achieving tighter bounds or addressing more general noise. Examples include:

Bandits with Structures: The classic multi-armed bandit model has been extended to many more general settings with specific structures (such as graphs). Examples include:

Nonparametric Online Learning: Nonparametric online learning (in full information or bandit settings) considers competing with the best possible fixed function (instead of fixed action or fixed function from a structural class). Examples include:

Bandit Convex Optimization: Bandit convex optimization is a challenging topic. Several recent works made great progress in this area. Examples include:

Empirical Evaluations:

(Common benchmark datasets can be found in UCI Repository, LIBSVM datasets, or other sources you prefer.)

Comparison of Expert Algorithms: We have discussed many expert algorithms throughout the semester, each with different properties. Conduct a thorough experimental evaluation and comparison of these algorithms. You are recommended to follow these steps:
- Find concrete problems (with data available) that can be modeled as an expert problem as your experiment setups.
- Collect different sets of data, with the goal of illustrating the strength of different algorithms. Create synthetic data if real data is not enough.
- Implement the expert algorithms that you plan to experiment on, in any language you prefer.
- Compare these algorithms on your problems in terms of performance, running time, etc..
- Pay extra attention to parameter tuning. Make a concrete and detailed plan on systematically testing different values of the parameters.
- Think about whether you have suggestions to modify existing algorithms to make them perform better. Test your ideas.
- Report the final results in a clear and informative way, make conclusions on what you observe.

Comparison of (Expert-algorithm-tuned) Boosting Algorithms: In Lecture 8 we briefly talked about boosting algorithms and discussed how to tuned any expert algorithm into a boosting algorithm. Conduct a thorough experimental evaluation and comparison of these expert-algorithm-tuned boosting algorithms and other existing boosting algorithms. You are recommended to follow these steps:
- Collect binary classification datasets that you plan to test on. (Since we only talked about binary boosting, you can just focus on binary classification tasks.)
- Decide which existing boosting algorithms to use in your experiments. We only talked about AdaBoost in the class. More boosting algorithms can be found for example in the survey book by Schapire and Freund (available in USC library; can also borrow a copy from me).
- Decide which expert algorithms you want to tune to boosting algorithms and test on.
- Decide what the "weak learning oracle" is in your experiments (such as decision trees, neural nets, etc.).
- Implement these algorithms in your preferred language. You are welcomed to use existing packages for the weak learning oracle (such as scikit-learn).
- Compare these algorithms in terms of performance (training error/testing error/margin...), running time, etc..
- Pay extra attention to parameter tuning. Make a concrete and detailed plan on systematically testing different values of the parameters.
- Report the final results in a clear and informative way, make conclusions on what you observe.

Comparison of Bandit Multiclass Algorithms: Bandit Multiclass problem is about sequentially predicting the label of an example in a multiclass setting, with feedback only about the correctness of the prediction but not the true label. It is closely related to the contextual bandit problem. Conduct experiments to compare different bandit multiclass algorithms. You are recommended to follow these steps:
- First do a brief survey on this problem by reading one or more of these papers, with focus on the algorithms and experimental setups: Efficient Bandit Algorithms for Online Multiclass Prediction, NEWTRON: an Efficient Bandit algorithm for Online Multiclass Prediction, Efficient Online Bandit Multiclass Learning with O(√T) Regret. Decide which algorithms you want to use in the experiments.
- Collect multiclass datasets that you want to test on. Also, regenerate the synthetic datasets used in these papers (they all used the same synthetic data).
- Pick some algorithms that we have discussed in the class (LinUCB, Exp4, Epsilon-Greedy, Minimonster etc.) and think about how to use them to solve this bandit multiclass problem. Decide which algorithms you want to use in the experiments.
- Implement these algorithms in your preferred language. You are welcomed to use existing packages as components of your code.
- Compare these algorithms in terms of performance, running time, etc..
- Pay extra attention to parameter tuning. Make a concrete and detailed plan on systematically testing different values of the parameters.
- Report the final results in a clear and informative way, make conclusions on what you observe.

New Theoretical Research:

Parameter-free path-length bounds: In lecture 6 we discussed an adaptive bound for the expert problem that is in terms of the path-length of the best expert. The bound can only be achieved with an oracle tuning of the learning rate, and it is still unclear whether one can achieve this bound without knowing the path-length of the best expert in advance (that is, with a parameter-free algorithm). You have also seen in Homework1 that doubling trick does not work. Explore this direction by either thinking about how to obtain such algorithms or showing that this is indeed impossible (which is pretty likely in my opinion).

Parameter-free Multi-Armed Bandit for Non-stationary Environments: In lecture 9 we have seen a principle way to construct a strongly adaptive algorithm in the full information setting, as well as its implications for non-stationary environments (such as obtaining switching regret without knowing the number of switches, or dynamic regret without knowing the variation). It turns out that, strongly adaptive algorithm is impossible in the bandit setting (see Sec 4 of this paper). However, this does not rule out the possibility of having parameter-free algorithms with switching regret or dynamic regret. Explore whether this is possible in the multi-armed bandit setting.

"Small-loss" Bound for contextual bandit: We have discussed small loss bounds for the expert problem in Lecture 4 and also have seen several implications of such adaptive bounds throughout the semester. In Homework3 you will see that such bound is possible for multi-armed bandit too. However, things become surprisingly unclear for the contextual bandit problem. More details can be found in this open problem (it has some monetary rewards ☺). Explore this direction and try to make progress.