Preprints and Publications
(* indicates equal contribution)
Preprints
Heavy-Tailed Linear Bandits: Adversarial Robustness, Best-of-Both-Worlds, and Beyond. [arXiv]
Canzhe Zhao*, Shinji Ito*, Shuai Li.
Under review, 2025.
Publications
2025
Learning Imperfect Information Extensive-form Games with Last-iterate Convergence under Bandit Feedback.
Canzhe Zhao, Yutian Cheng, Jing Dong, Baoxiang Wang, Shuai Li.
ICML, 2025.
Towards Provably Efficient Learning of Imperfect Information Extensive-Form Games with Linear Function Approximation.
Canzhe Zhao, Shuze Chen, Weiming Liu, Haobo Fu, Qiang Fu, Shuai Li.
UAI, 2025.
Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions.
Canzhe Zhao, Xiangcheng Zhang, Baoxiang Wang, Shuai Li.
AAAI, 2025.
2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback. [arXiv][poster]
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li.
NeurIPS, 2023.
Best-of-Three-Worlds Analysis for Linear Bandits with Follow-the-Regularized-Leader algorithm. [arXiv]
Fang Kong, Canzhe Zhao, Shuai Li.
COLT, 2023.
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning.
Canzhe Zhao*, Yanjie Ze*, Jing Dong, Baoxiang Wang, Shuai Li.
IJCAI, 2023.
Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition. [link][slides]
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Shuai Li.
ICLR, 2023
Clustering of Conversational Bandits with Posterior Sampling for User Preference Learning and Elicitation. [link]
Qizhi Li*, Canzhe Zhao*, Tong Yu, Junda Wu, Shuai Li.
UMUAI, 2023
Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization. [link]
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li.
WSDM, 2023.
2022
Knowledge-aware Conversational Preference Elicitation with Bandit Feedback. [link] [slides]
Canzhe Zhao, Tong Yu, Zhihui Xie, Shuai Li.
WWW, 2022.
Simultaneously Learning Stochastic and Adversarial Bandits under the Position-based Model. [arXiv]
Cheng Chen, Canzhe Zhao, Shuai Li.
AAAI, 2022.
2021
Clustering of Conversational Bandits for User Preference Learning and Elicitation. [link][slides]
Junda Wu*, Canzhe Zhao*, Tong Yu, Jingyang Li, Shuai Li.
CIKM, 2021.
Comparison-based Conversational Recommender System with Relative Bandit Feedback. [link]
Zhihui Xie, Tong Yu, Canzhe Zhao, Shuai Li.
SIGIR, 2021.
|