Publications and Preprints


(* indicates equal contribution)

Publications

2025

  • Learning Imperfect Information Extensive-form Games with Last-iterate Convergence under Bandit Feedback
    Canzhe Zhao, Yutian Cheng, Jing Dong, Baoxiang Wang, Shuai Li.
    ICML, 2025.

  • Towards Provably Efficient Learning of Imperfect Information Extensive-Form Games with Linear Function Approximation
    Canzhe Zhao, Shuze Chen, Weiming Liu, Haobo Fu, QIANG FU, Shuai Li.
    UAI, 2025.

  • Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions
    Canzhe Zhao, XiangCheng Zhang, Baoxiang Wang, and Shuai Li.
    AAAI, 2025.

2023

  • Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback [arXiv][poster]
    Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang and Shuai Li.
    NeurIPS, 2023.

  • Best-of-three-worlds analysis for linear bandits with follow-the-regularized-leader algorithm [arXiv]
    Fang Kong, Canzhe Zhao, and Shuai Li.
    COLT, 2023.

  • DPMAC: Differentially private communication for cooperative multi-agent reinforcement learning
    Canzhe Zhao*, Yanjie Ze*, Jing Dong, Baoxiang Wang, and Shuai Li.
    IJCAI, 2023.

  • Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition [link][slides]
    Canzhe Zhao, Ruofeng Yang, Baoxiang Wang and Shuai Li.
    ICLR, 2023

  • Clustering of Conversational Bandits with Posterior Sampling for User Preference Learning and Elicitation [link]
    Qizhi Li*, Canzhe Zhao*, Tong Yu, Junda Wu and Shuai Li.
    UMUAI, 2023

  • Differentially private temporal difference learning with stochastic nonconvex-strongly-concave optimization [link]
    Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, and Shuai Li.
    WSDM, 2023.

2022

  • Knowledge-aware conversational preference elicitation with bandit feedback [link] [slides]
    Canzhe Zhao, Tong Yu, Zhihui Xie, and Shuai Li.
    WWW, 2022.

  • Simultaneously learning stochastic and adversarial bandits under the position-based model [arXiv]
    Cheng Chen, Canzhe Zhao, and Shuai Li.
    AAAI, 2022.

2021

  • Clustering of conversational bandits for user preference learning and elicitation [link][slides]
    Junda Wu*, Canzhe Zhao*, Tong Yu, Jingyang Li, and Shuai Li.
    CIKM, 2021.

  • Comparison-based conversational recommender system with relative bandit feedback [link]
    Zhihui Xie, Tong Yu, Canzhe Zhao, and Shuai Li.
    SIGIR, 2021.

Preprints