Publications and Preprints
(* indicates equal contribution)
Publications
2025
Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions
Canzhe Zhao, XiangCheng Zhang, Baoxiang Wang, and Shuai Li,
AAAI, 2025.
2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback [arXiv][poster]
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang and Shuai Li,
NeurIPS, 2023.
Best-of-three-worlds analysis for linear bandits with follow-the-regularized-leader algorithm [arXiv]
Fang Kong, Canzhe Zhao, and Shuai Li,
COLT, 2023.
DPMAC: Differentially private communication for cooperative multi-agent reinforcement learning
Canzhe Zhao*, Yanjie Ze*, Jing Dong, Baoxiang Wang, and Shuai Li,
IJCAI, 2023.
Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition [link][slides]
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang and Shuai Li,
ICLR, 2023
Clustering of Conversational Bandits with Posterior Sampling for User Preference Learning and Elicitation [link]
Qizhi Li*, Canzhe Zhao*, Tong Yu, Junda Wu and Shuai Li,
UMUAI, 2023
Differentially private temporal difference learning with stochastic nonconvex-strongly-concave optimization [link]
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, and Shuai Li,
WSDM, 2023.
2022
Knowledge-aware conversational preference elicitation with bandit feedback [link] [slides]
Canzhe Zhao, Tong Yu, Zhihui Xie, and Shuai Li,
WWW, 2022.
Simultaneously learning stochastic and adversarial bandits under the position-based model [arXiv]
Cheng Chen, Canzhe Zhao, and Shuai Li,
AAAI, 2022.
2021
Clustering of conversational bandits for user preference learning and elicitation [link][slides]
Junda Wu*, Canzhe Zhao*, Tong Yu, Jingyang Li, and Shuai Li,
CIKM, 2021.
Comparison-based conversational recommender system with relative bandit feedback [link]
Zhihui Xie, Tong Yu, Canzhe Zhao, and Shuai Li,
SIGIR, 2021.
Preprints
|