Publications

Preprints

  • [arXiv] Tractable Local Equilibria in Non-Concave Games.
    Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, and Weiqiang Zheng.

  • [arXiv] Contextual Multinomial Logit Bandits with General Value Functions.
    Mengxiao Zhang and Haipeng Luo.

  • [arXiv] Efficient Contextual Bandits with Uninformed Feedback Graphs.
    Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, and Paul Mineiro.

  • [arXiv] Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games.
    Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng.

Conference Papers

2024:

  • [AISTATS 2024 Oral] Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games.
    Yang Cai, Haipeng Luo, Chen-Yu Wei, and Weiqiang Zheng.

  • [AISTATS 2024] Online Learning in Contextual Second-Price Pay-Per-Click Auctions.
    Mengxiao Zhang and Haipeng Luo.

2023:

  • [NeurIPS 2023 spotlight] Regret Matching+: (In)Stability and Fast Convergence in Games.
    Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, and Haipeng Luo.

  • [NeurIPS 2023] No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions.
    Tiancheng Jin, Junyan Liu, Chloé Rouyer, William Chang, Chen-Yu Wei, and Haipeng Luo.

  • [NeurIPS 2023] Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms.
    Tiancheng Jin, Junyan Liu, and Haipeng Luo.

  • [NeurIPS 2023] Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games.
    Yang Cai, Haipeng Luo, Chen-Yu Wei, and Weiqiang Zheng.

  • [NeurIPS 2023] Practical Contextual Bandits with Feedback Graphs.
    Mengxiao Zhang, Yuheng Zhang, Olga Vrousgou, Haipeng Luo, and Paul Mineiro.

  • [ICML 2023] Refined Regret for Adversarial MDPs with Linear Function Approximation.
    Yan Dai, Haipeng Luo, Chen-Yu Wei, and Julian Zimmert.

  • [ALT 2023] Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs.
    Haipeng Luo, Hanghang Tong, Mengxiao Zhang, and Yuheng Zhang.

  • [AISTATS 2023] No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution.
    Mengxiao Zhang, Shi Chen, Haipeng Luo, and Yingfei Wang.

  • [UAI 2023] Posterior Sampling-based Online Learning for the Stochastic Shortest Path Model.
    Mehdi Jafarnia-Jahromi, Liyu Chen, Rahul Jain, and Haipeng Luo.

2022:

  • [NeurIPS 2022 Oral] Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games.
    Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, and Tuomas Sandholm.

  • [NeurIPS 2022] Near-Optimal No-Regret Learning for General Convex Games.
    Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, and Tuomas Sandholm.

  • [NeurIPS 2022] Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback.
    Yan Dai, Haipeng Luo, and Liyu Chen.

  • [NeurIPS 2022] Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback.
    Tiancheng Jin, Tal Lancewicki, Haipeng Luo, Yishay Mansour, and Aviv Rosenberg.

  • [NeurIPS 2022] Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments.
    Liyu Chen and Haipeng Luo.

  • [NeurIPS 2022 OPT Workshop] Clairvoyant Regret Minimization: Equivalence with Nemirovski’s Conceptual Prox Method and Extension to General Convex Games.
    Gabriele Farina, Christian Kroer, Chung-Wei Lee, and Haipeng Luo.

  • [COLT 2022] Policy Optimization for Stochastic Shortest Path.
    Liyu Chen, Haipeng Luo, and Aviv Rosenberg.

  • [COLT 2022] Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits.
    Haipeng Luo, Mengxiao Zhang, Peng Zhao, and Zhi-Hua Zhou.

  • [COLT 2022] Adaptive Bandit Convex Optimization with Heterogeneous Curvature.
    Haipeng Luo, Mengxiao Zhang, and Peng Zhao.

  • [ICML 2022 Long Talk] Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP.
    Liyu Chen, Rahul Jain, and Haipeng Luo.

  • [ICML 2022] Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints.
    Liyu Chen, Rahul Jain, and Haipeng Luo.

  • [ICML 2022] No-Regret Learning in Time-Varying Zero-Sum Games.
    Mengxiao Zhang, Peng Zhao, Haipeng Luo, and Zhi-Hua Zhou.

  • [ICML 2022] Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games.
    Gabriele Farina, Chung-Wei Lee, Haipeng Luo, and Christian Kroer

2021:

  • [NeurIPS 2021 Oral] The Best of Both Worlds: Stochastic and Adversarial Episodic MDPs with Unknown Transition.
    Tiancheng Jin, Longbo Huang, and Haipeng Luo.

  • [NeurIPS 2021] Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses.
    Haipeng Luo, Chen-Yu Wei, and Chung-Wei Lee.

  • [NeurIPS 2021] Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path.
    Liyu Chen, Mehdi Jafarnia-Jahromi, Rahul Jain, and Haipeng Luo.

  • [NeurIPS 2021] Last-iterate Convergence in Extensive-Form Games.
    Chung-Wei Lee, Christian Kroer, and Haipeng Luo.

  • [COLT 2021 Best Paper Award] Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach.
    Chen-Yu Wei and Haipeng Luo.

  • [COLT 2021] Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications.
    Liyu Chen, Haipeng Luo, and Chen-Yu Wei.

  • [COLT 2021] Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition.
    Liyu Chen, Haipeng Luo, and Chen-Yu Wei.

  • [COLT 2021] Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games.
    Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, and Haipeng Luo

  • [ICML 2021] Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case.
    Liyu Chen and Haipeng Luo.

  • [ICML 2021] Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously.
    Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang, and Xiaojin Zhang

  • [ICLR 2021] Linear Last-iterate Convergence in Constrained Saddle-point Optimization.
    Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, and Haipeng Luo.

  • [AISTATS 2021] Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation.
    Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, and Rahul Jain.

  • [AISTATS 2021] Active Online Learning with Hidden Shifting Domains.
    Yining Chen, Haipeng Luo, Tengyu Ma, and Chicheng Zhang.

  • [ALT 2021] Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds.
    Ehsan Emamjomeh-Zadeh, Chen-Yu Wei, Haipeng Luo, and David Kempe.

2020:

  • [NeurIPS 2020 Oral] Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs.
    Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, and Mengxiao Zhang.

  • [NeurIPS 2020 spotlight] Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition.
    Tiancheng Jin and Haipeng Luo.

  • [NeurIPS 2020] Comparator-Adaptive Convex Bandits.
    Dirk van der Hoeven, Ashok Cutkosky, and Haipeng Luo.

  • [COLT 2020] Taking a Hint: How to Leverage Loss Predictors in Contextual Bandits?
    Chen-Yu Wei, Haipeng Luo, and Alekh Agarwal.

  • [COLT 2020] A Closer Look at Small-loss Bounds for Bandits with Graph Feedback.
    Chung-Wei Lee, Haipeng Luo, and Mengxiao Zhang.

  • [ICML 2020] Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition.
    Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, and Tiancheng Yu.

  • [ICML 2020] Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes.
    Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Hiteshi Sharma, and Rahul Jain.

  • [UAI 2020] Fair Contextual Multi-Armed Bandits: Theory and Experiments.
    Yifang Chen, Alex Cuellar, Haipeng Luo, Jignesh Modi, Heramb Nemlekar, and Stefanos Nikolaidis.

2019:

  • [NeurIPS 2019 spotlight] Model Selection for Contextual Bandits.
    Dylan J. Foster, Akshay Krishnamurthy, and Haipeng Luo.

  • [NeurIPS 2019] Equipping Experts/Bandits with Long-term Memory.
    Kai Zheng, Haipeng Luo, Ilias Diakonikolas, and Liwei Wang.

  • [NeurIPS 2019] Hypothesis Set Stability and Generalization.
    Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, and Karthik Sridharan.

  • [COLT 2019] Improved Path-length Regret Bounds for Bandits.
    Sébastien Bubeck, Yuanzhi Li, Haipeng Luo, and Chen-Yu Wei.

  • [COLT 2019] A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free.
    Yifang Chen, Chung-Wei Lee, Haipeng Luo, and Chen-Yu Wei.

  • [COLT 2019 joint extended abstract] Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information.
    Peter Auer, Yifang Chen, Pratik Gajane, Chung-Wei Lee, Haipeng Luo, Ronald Ortner, and Chen-Yu Wei.

  • [ICML 2019 Long Talk] Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously.
    Julian Zimmert, Haipeng Luo, and Chen-Yu Wei.

2018:

  • [NeurIPS 2018 spotlight] Efficient Online Portfolio with Logarithmic Regret.
    Haipeng Luo, Chen-Yu Wei, and Kai Zheng.

  • [COLT 2018 Best Student Paper Award] Logistic Regression: The Importance of Being Improper.
    Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, and Karthik Sridharan.

  • [COLT 2018] More Adaptive Algorithms for Adversarial Bandits.
    Chen-Yu Wei and Haipeng Luo.

  • [COLT 2018] Efficient Contextual Bandits in Non-stationary Worlds.
    Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, and John Langford.

  • [ICML 2018] Practical Contextual Bandits with Regression Oracles.
    Dylan J. Foster, Alekh Agarwal, Miroslav Dudik, Haipeng Luo, and Robert E. Schapire.

Before 2017:

  • [FOCS 2017, JACM] Oracle-Efficient Online Learning and Auction Design.
    Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, and Jennifer Wortman Vaughan.

  • [COLT 2017] Corralling a Band of Bandit Algorithms.
    Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, and Robert E. Schapire.

  • [NeurIPS 2016] Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits.
    Vasilis Syrgkanis, Haipeng Luo, Akshay Krishnamurthy, and Robert E. Schapire.

  • [NeurIPS 2016] Efficient Second Order Online Learning via Sketching.
    Haipeng Luo, Alekh Agarwal, Nicolò Cesa-Bianchi, and John Langford.

  • [ICML 2016] Variance-Reduced and Projection-Free Stochastic Optimization.
    Elad Hazan and Haipeng Luo.

  • [NeurIPS 2015 Best Paper Award] Fast Convergence of Regularized Learning in Games.
    Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire.

  • [NeurIPS 2015] Online Gradient Boosting.
    Alina Beygelzimer, Elad Hazan, Satyen Kale, and Haipeng Luo.

  • [COLT 2015] Achieving All with No Parameters: AdaNormalHedge.
    Haipeng Luo and Robert E. Schapire.

  • [NeurIPS 2014] A Drifting-Games Analysis for Online Learning and Applications to Boosting.
    Haipeng Luo and Robert E. Schapire.

  • [NeurIPS 2014 OPT workshop] Accelerated Parallel Optimization Methods for Large Scale Machine Learning.
    Haipeng Luo, Patrick Haffner, and Jean-Francois Paiement.

  • [ICML 2014] Towards Minimax Online Learning with Unknown Time Horizon.
    Haipeng Luo and Robert E. Schapire.

Open Problems

  • [COLT 2020] Open Problem: Model Selection for Contextual Bandits. [A negative answer by Marinov and Zimmert]
    Dylan Foster, Akshay Krishnamurthy, and Haipeng Luo.

  • [COLT 2017] Open Problem: First-Order Regret Bounds for Contextual Bandits. [A solution by Allen-Zhu, Bubeck, and Li]
    Alekh Agarwal, Akshay Krishnamurthy, John Langford, Haipeng Luo, and Robert E. Schapire.

PhD Thesis

Misc.