Publications and Preprints
(* indicates equal contribution)
Publications
2025
Learning Imperfect Information Extensive-form Games with Last-iterate Convergence under Bandit Feedback
Canzhe Zhao, Yutian Cheng, Jing Dong, Baoxiang Wang, Shuai Li.
ICML, 2025.
Towards Provably Efficient Learning of Imperfect Information Extensive-Form Games with Linear Function Approximation
Canzhe Zhao, Shuze Chen, Weiming Liu, Haobo Fu, QIANG FU, Shuai Li.
UAI, 2025.
Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions
Canzhe Zhao, XiangCheng Zhang, Baoxiang Wang, and Shuai Li.
AAAI, 2025.
2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback [arXiv][poster]
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang and Shuai Li.
NeurIPS, 2023.
Best-of-three-worlds analysis for linear bandits with follow-the-regularized-leader algorithm [arXiv]
Fang Kong, Canzhe Zhao, and Shuai Li.
COLT, 2023.
DPMAC: Differentially private communication for cooperative multi-agent reinforcement learning
Canzhe Zhao*, Yanjie Ze*, Jing Dong, Baoxiang Wang, and Shuai Li.
IJCAI, 2023.
Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition [link][slides]
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang and Shuai Li.
ICLR, 2023
Clustering of Conversational Bandits with Posterior Sampling for User Preference Learning and Elicitation [link]
Qizhi Li*, Canzhe Zhao*, Tong Yu, Junda Wu and Shuai Li.
UMUAI, 2023
Differentially private temporal difference learning with stochastic nonconvex-strongly-concave optimization [link]
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, and Shuai Li.
WSDM, 2023.
2022
Knowledge-aware conversational preference elicitation with bandit feedback [link] [slides]
Canzhe Zhao, Tong Yu, Zhihui Xie, and Shuai Li.
WWW, 2022.
Simultaneously learning stochastic and adversarial bandits under the position-based model [arXiv]
Cheng Chen, Canzhe Zhao, and Shuai Li.
AAAI, 2022.
2021
Clustering of conversational bandits for user preference learning and elicitation [link][slides]
Junda Wu*, Canzhe Zhao*, Tong Yu, Jingyang Li, and Shuai Li.
CIKM, 2021.
Comparison-based conversational recommender system with relative bandit feedback [link]
Zhihui Xie, Tong Yu, Canzhe Zhao, and Shuai Li.
SIGIR, 2021.
Preprints
|