Publications

Preprints

[arXiv] Reinforcement Learning from Adversarial Preferences in Tabular MDPs.
Taira Tsuchiya, Shinji Ito, and Haipeng Luo.

[arXiv] On Separation Between Best-Iterate, Random-Iterate, and Last-Iterate Convergence of Learning in Games.
Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, and Weiqiang Zheng.

[arXiv] Group Distributionally Robust Optimization with Flexible Sample Queries.
Haomin Bai, Dingzhi Yu, Shuai Li, Haipeng Luo, and Lijun Zhang.

Conference Papers

2025:

[NeurIPS 2025 Spotlight] Comparator-Adaptive Φ-Regret: Improved Bounds, Simpler Algorithms, and Applications to Games.
Soumita Hait, Ping Li, Haipeng Luo, and Mengxiao Zhang.

[NeurIPS 2025 Spotlight] Simultaneous Swap Regret Minimization via KL-Calibration.
Haipeng Luo, Spandan Senapati, and Vatsal Sharan.

[NeurIPS 2025 Spotlight] Improved Bounds for Swap Multicalibration and Swap Omniprediction.
Haipeng Luo, Spandan Senapati, and Vatsal Sharan.

[NeurIPS 2025] From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications.
Yang Cai, Haipeng Luo, Chen-Yu Wei, and Weiqiang Zheng.

[NeurIPS 2025] Improved Regret and Contextual Linear Extension for Pandora’s Box and Prophet Inequality.
Junyan Liu, Ziyun Chen, Kun Wang, Haipeng Luo, and Lillian J. Ratliff.

[NeurIPS 2025] Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback.
Shinji Ito, Kevin Jamieson, Haipeng Luo, Arnab Maiti, and Taira Tsuchiya.

[COLT 2025] Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback.
Shinji Ito, Haipeng Luo, Taira Tsuchiya, and Yue Wu.

[COLT 2025] Alternating Regret for Online Convex Optimization.
Soumita Hait, Ping Li, Haipeng Luo, and Mengxiao Zhang.

[COLT 2025] Corrupted Learning Dynamics in Games.
Taira Tsuchiya, Shinji Ito, and Haipeng Luo.

[ICML 2025] Contextual Linear Bandits with Delay as Payoff.
Mengxiao Zhang, Yingfei Wang, and Haipeng Luo.

[ICLR 2025] Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games.
Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, and Weiqiang Zheng.

2024:

[NeurIPS 2024] Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms.
Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, and Weiqiang Zheng.

[NeurIPS 2024] Optimal Multiclass U-Calibration Error and Beyond.
Haipeng Luo, Spandan Senapati, and Vatsal Sharan.

[NeurIPS 2024] No-Regret Learning for Fair Multi-Agent Social Welfare Optimization.
Mengxiao Zhang, Ramiro Deo-Campo Vuong, and Haipeng Luo.

[NeurIPS 2024] Tractable Local Equilibria in Non-Concave Games.
Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, and Weiqiang Zheng.

[NeurIPS 2024] Contextual Multinomial Logit Bandits with General Value Functions.
Mengxiao Zhang and Haipeng Luo.

[NeurIPS 2024] Provably Efficient Interaction-Grounded Learning with Personalized Reward.
Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, and Paul Mineiro.

[ICML 2024] Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback.
Asaf Cassel, Haipeng Luo, Aviv Rosenberg, and Dmitry Sotnikov.

[ICML 2024] Efficient Contextual Bandits with Uninformed Feedback Graphs.
Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, and Paul Mineiro.

[ICML 2024] ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints.
Akhil Agnihotri, Rahul Jain, and Haipeng Luo.

[AISTATS 2024 Oral] Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games.
Yang Cai, Haipeng Luo, Chen-Yu Wei, and Weiqiang Zheng.

[AISTATS 2024] Online Learning in Contextual Second-Price Pay-Per-Click Auctions.
Mengxiao Zhang and Haipeng Luo.

2023:

[NeurIPS 2023 Spotlight] Regret Matching+: (In)Stability and Fast Convergence in Games.
Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, and Haipeng Luo.

[NeurIPS 2023] No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions.
Tiancheng Jin, Junyan Liu, Chloé Rouyer, William Chang, Chen-Yu Wei, and Haipeng Luo.

[NeurIPS 2023] Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms.
Tiancheng Jin, Junyan Liu, and Haipeng Luo.

[NeurIPS 2023] Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games.
Yang Cai, Haipeng Luo, Chen-Yu Wei, and Weiqiang Zheng.

[NeurIPS 2023] Practical Contextual Bandits with Feedback Graphs.
Mengxiao Zhang, Yuheng Zhang, Olga Vrousgou, Haipeng Luo, and Paul Mineiro.

[ICML 2023] Refined Regret for Adversarial MDPs with Linear Function Approximation.
Yan Dai, Haipeng Luo, Chen-Yu Wei, and Julian Zimmert.

[ALT 2023] Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs.
Haipeng Luo, Hanghang Tong, Mengxiao Zhang, and Yuheng Zhang.

[AISTATS 2023] No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution.
Mengxiao Zhang, Shi Chen, Haipeng Luo, and Yingfei Wang.

[UAI 2023] Posterior Sampling-based Online Learning for the Stochastic Shortest Path Model.
Mehdi Jafarnia-Jahromi, Liyu Chen, Rahul Jain, and Haipeng Luo.

2022:

[NeurIPS 2022 Oral] Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games.
Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, and Tuomas Sandholm.

[NeurIPS 2022] Near-Optimal No-Regret Learning for General Convex Games.
Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, and Tuomas Sandholm.

[NeurIPS 2022] Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback.
Yan Dai, Haipeng Luo, and Liyu Chen.

[NeurIPS 2022] Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback.
Tiancheng Jin, Tal Lancewicki, Haipeng Luo, Yishay Mansour, and Aviv Rosenberg.

[NeurIPS 2022] Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments.
Liyu Chen and Haipeng Luo.

[NeurIPS 2022 OPT Workshop] Clairvoyant Regret Minimization: Equivalence with Nemirovski’s Conceptual Prox Method and Extension to General Convex Games.
Gabriele Farina, Christian Kroer, Chung-Wei Lee, and Haipeng Luo.

[COLT 2022] Policy Optimization for Stochastic Shortest Path.
Liyu Chen, Haipeng Luo, and Aviv Rosenberg.

[COLT 2022] Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits.
Haipeng Luo, Mengxiao Zhang, Peng Zhao, and Zhi-Hua Zhou.

[COLT 2022] Adaptive Bandit Convex Optimization with Heterogeneous Curvature.
Haipeng Luo, Mengxiao Zhang, and Peng Zhao.

[ICML 2022 Long Talk] Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP.
Liyu Chen, Rahul Jain, and Haipeng Luo.

[ICML 2022] Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints.
Liyu Chen, Rahul Jain, and Haipeng Luo.

[ICML 2022] No-Regret Learning in Time-Varying Zero-Sum Games.
Mengxiao Zhang, Peng Zhao, Haipeng Luo, and Zhi-Hua Zhou.

[ICML 2022] Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games.
Gabriele Farina, Chung-Wei Lee, Haipeng Luo, and Christian Kroer

2021:

[NeurIPS 2021 Oral] The Best of Both Worlds: Stochastic and Adversarial Episodic MDPs with Unknown Transition.
Tiancheng Jin, Longbo Huang, and Haipeng Luo.

[NeurIPS 2021] Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses.
Haipeng Luo, Chen-Yu Wei, and Chung-Wei Lee.

[NeurIPS 2021] Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path.
Liyu Chen, Mehdi Jafarnia-Jahromi, Rahul Jain, and Haipeng Luo.

[NeurIPS 2021] Last-iterate Convergence in Extensive-Form Games.
Chung-Wei Lee, Christian Kroer, and Haipeng Luo.

[COLT 2021 Best Paper Award] Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach.
Chen-Yu Wei and Haipeng Luo.

[COLT 2021] Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications.
Liyu Chen, Haipeng Luo, and Chen-Yu Wei.

[COLT 2021] Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition.
Liyu Chen, Haipeng Luo, and Chen-Yu Wei.

[COLT 2021] Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games.
Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, and Haipeng Luo

[ICML 2021] Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case.
Liyu Chen and Haipeng Luo.

[ICML 2021] Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously.
Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang, and Xiaojin Zhang

[ICLR 2021] Linear Last-iterate Convergence in Constrained Saddle-point Optimization.
Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, and Haipeng Luo.

[AISTATS 2021] Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation.
Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, and Rahul Jain.

[AISTATS 2021] Active Online Learning with Hidden Shifting Domains.
Yining Chen, Haipeng Luo, Tengyu Ma, and Chicheng Zhang.

[ALT 2021] Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds.
Ehsan Emamjomeh-Zadeh, Chen-Yu Wei, Haipeng Luo, and David Kempe.

2020:

[NeurIPS 2020 Oral] Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs.
Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, and Mengxiao Zhang.

[NeurIPS 2020 Spotlight] Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition.
Tiancheng Jin and Haipeng Luo.

[NeurIPS 2020] Comparator-Adaptive Convex Bandits.
Dirk van der Hoeven, Ashok Cutkosky, and Haipeng Luo.

[COLT 2020] Taking a Hint: How to Leverage Loss Predictors in Contextual Bandits?
Chen-Yu Wei, Haipeng Luo, and Alekh Agarwal.

[COLT 2020] A Closer Look at Small-loss Bounds for Bandits with Graph Feedback.
Chung-Wei Lee, Haipeng Luo, and Mengxiao Zhang.

[ICML 2020] Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition.
Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, and Tiancheng Yu.

[ICML 2020] Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes.
Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Hiteshi Sharma, and Rahul Jain.

[UAI 2020] Fair Contextual Multi-Armed Bandits: Theory and Experiments.
Yifang Chen, Alex Cuellar, Haipeng Luo, Jignesh Modi, Heramb Nemlekar, and Stefanos Nikolaidis.

2019:

[NeurIPS 2019 Spotlight] Model Selection for Contextual Bandits.
Dylan J. Foster, Akshay Krishnamurthy, and Haipeng Luo.

[NeurIPS 2019] Equipping Experts/Bandits with Long-term Memory.
Kai Zheng, Haipeng Luo, Ilias Diakonikolas, and Liwei Wang.

[NeurIPS 2019] Hypothesis Set Stability and Generalization.
Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, and Karthik Sridharan.

[COLT 2019] Improved Path-length Regret Bounds for Bandits.
Sébastien Bubeck, Yuanzhi Li, Haipeng Luo, and Chen-Yu Wei.

[COLT 2019] A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free.
Yifang Chen, Chung-Wei Lee, Haipeng Luo, and Chen-Yu Wei.

[COLT 2019 joint extended abstract] Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information.
Peter Auer, Yifang Chen, Pratik Gajane, Chung-Wei Lee, Haipeng Luo, Ronald Ortner, and Chen-Yu Wei.

[ICML 2019 Long Talk] Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously.
Julian Zimmert, Haipeng Luo, and Chen-Yu Wei.

2018:

[NeurIPS 2018 Spotlight] Efficient Online Portfolio with Logarithmic Regret.
Haipeng Luo, Chen-Yu Wei, and Kai Zheng.

[COLT 2018 Best Student Paper Award] Logistic Regression: The Importance of Being Improper.
Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, and Karthik Sridharan.

[COLT 2018] More Adaptive Algorithms for Adversarial Bandits.
Chen-Yu Wei and Haipeng Luo.

[COLT 2018] Efficient Contextual Bandits in Non-stationary Worlds.
Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, and John Langford.

[ICML 2018] Practical Contextual Bandits with Regression Oracles.
Dylan J. Foster, Alekh Agarwal, Miroslav Dudik, Haipeng Luo, and Robert E. Schapire.

2017 and Before:

[FOCS 2017, JACM] Oracle-Efficient Online Learning and Auction Design.
Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, and Jennifer Wortman Vaughan.

[COLT 2017] Corralling a Band of Bandit Algorithms.
Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, and Robert E. Schapire.

[NeurIPS 2016] Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits.
Vasilis Syrgkanis, Haipeng Luo, Akshay Krishnamurthy, and Robert E. Schapire.

[NeurIPS 2016] Efficient Second Order Online Learning via Sketching.
Haipeng Luo, Alekh Agarwal, Nicolò Cesa-Bianchi, and John Langford.

[ICML 2016] Variance-Reduced and Projection-Free Stochastic Optimization.
Elad Hazan and Haipeng Luo.

[NeurIPS 2015 Best Paper Award] Fast Convergence of Regularized Learning in Games.
Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire.

[NeurIPS 2015] Online Gradient Boosting.
Alina Beygelzimer, Elad Hazan, Satyen Kale, and Haipeng Luo.

[ICML 2015 Best Paper Award, IJCAI 2016 sister conference best paper track] Optimal and Adaptive Algorithms for Online Boosting.
Alina Beygelzimer, Satyen Kale, and Haipeng Luo.

[COLT 2015] Achieving All with No Parameters: AdaNormalHedge.
Haipeng Luo and Robert E. Schapire.

[NeurIPS 2014] A Drifting-Games Analysis for Online Learning and Applications to Boosting.
Haipeng Luo and Robert E. Schapire.

[NeurIPS 2014 OPT workshop] Accelerated Parallel Optimization Methods for Large Scale Machine Learning.
Haipeng Luo, Patrick Haffner, and Jean-Francois Paiement.

[ICML 2014] Towards Minimax Online Learning with Unknown Time Horizon.
Haipeng Luo and Robert E. Schapire.

Open Problems

[COLT 2020] Open Problem: Model Selection for Contextual Bandits. [A negative answer by Marinov and Zimmert]
Dylan Foster, Akshay Krishnamurthy, and Haipeng Luo.

[COLT 2017] Open Problem: First-Order Regret Bounds for Contextual Bandits. [A solution by Allen-Zhu, Bubeck, and Li]
Alekh Agarwal, Akshay Krishnamurthy, John Langford, Haipeng Luo, and Robert E. Schapire.

PhD Thesis

Optimal and Adaptive Online Learning.

Misc.

[IEEE Transactions on Computers 2013] Adaptive Resource Provisioning for the Cloud Using Online Bin Packing.
Weijia Song, Zhen Xiao, Qi Chen, and Haipeng Luo.

[IEEE Transactions on Computers 2012] Automatic Scaling of Internet Applications for Cloud Computing Services.
Zhen Xiao, Qi Chen, and Haipeng Luo.