Preprints and Publications


(* indicates equal contribution)

Preprints

  • Heavy-Tailed Linear Bandits: Adversarial Robustness, Best-of-Both-Worlds, and Beyond. [arXiv]
    Canzhe Zhao*, Shinji Ito*, Shuai Li.
    Under review, 2025.

Publications

2025

  • Learning Imperfect Information Extensive-form Games with Last-iterate Convergence under Bandit Feedback.
    Canzhe Zhao, Yutian Cheng, Jing Dong, Baoxiang Wang, Shuai Li.
    ICML, 2025.

  • Towards Provably Efficient Learning of Imperfect Information Extensive-Form Games with Linear Function Approximation.
    Canzhe Zhao, Shuze Chen, Weiming Liu, Haobo Fu, Qiang Fu, Shuai Li.
    UAI, 2025.

  • Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions.
    Canzhe Zhao, Xiangcheng Zhang, Baoxiang Wang, Shuai Li.
    AAAI, 2025.

2023

  • Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback. [arXiv][poster]
    Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li.
    NeurIPS, 2023.

  • Best-of-Three-Worlds Analysis for Linear Bandits with Follow-the-Regularized-Leader algorithm. [arXiv]
    Fang Kong, Canzhe Zhao, Shuai Li.
    COLT, 2023.

  • DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning.
    Canzhe Zhao*, Yanjie Ze*, Jing Dong, Baoxiang Wang, Shuai Li.
    IJCAI, 2023.

  • Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition. [link][slides]
    Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Shuai Li.
    ICLR, 2023

  • Clustering of Conversational Bandits with Posterior Sampling for User Preference Learning and Elicitation. [link]
    Qizhi Li*, Canzhe Zhao*, Tong Yu, Junda Wu, Shuai Li.
    UMUAI, 2023

  • Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization. [link]
    Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li.
    WSDM, 2023.

2022

  • Knowledge-aware Conversational Preference Elicitation with Bandit Feedback. [link] [slides]
    Canzhe Zhao, Tong Yu, Zhihui Xie, Shuai Li.
    WWW, 2022.

  • Simultaneously Learning Stochastic and Adversarial Bandits under the Position-based Model. [arXiv]
    Cheng Chen, Canzhe Zhao, Shuai Li.
    AAAI, 2022.

2021

  • Clustering of Conversational Bandits for User Preference Learning and Elicitation. [link][slides]
    Junda Wu*, Canzhe Zhao*, Tong Yu, Jingyang Li, Shuai Li.
    CIKM, 2021.

  • Comparison-based Conversational Recommender System with Relative Bandit Feedback. [link]
    Zhihui Xie, Tong Yu, Canzhe Zhao, Shuai Li.
    SIGIR, 2021.