Publications and Preprints


(* indicates equal contribution)

Publications

2023

  • Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback [arXiv][poster]
    Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang and Shuai Li,
    NeurIPS, 2023.

  • Best-of-three-worlds analysis for linear bandits with follow-the-regularized-leader algorithm [arXiv]
    Fang Kong, Canzhe Zhao, and Shuai Li,
    COLT, 2023.

  • DPMAC: Differentially private communication for cooperative multi-agent reinforcement learning
    Canzhe Zhao*, Yanjie Ze*, Jing Dong, Baoxiang Wang, and Shuai Li,
    IJCAI, 2023.

  • Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition [link][slides]
    Canzhe Zhao, Ruofeng Yang, Baoxiang Wang and Shuai Li,
    ICLR, 2023

  • Clustering of Conversational Bandits with Posterior Sampling for User Preference Learning and Elicitation [link]
    Qizhi Li*, Canzhe Zhao*, Tong Yu, Junda Wu and Shuai Li,
    UMUAI, 2023

  • Differentially private temporal difference learning with stochastic nonconvex-strongly-concave optimization [link]
    Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, and Shuai Li,
    WSDM, 2023.

2022

  • Knowledge-aware conversational preference elicitation with bandit feedback [link] [slides]
    Canzhe Zhao, Tong Yu, Zhihui Xie, and Shuai Li,
    WWW, 2022.

  • Simultaneously learning stochastic and adversarial bandits under the position-based model [arXiv]
    Cheng Chen, Canzhe Zhao, and Shuai Li,
    AAAI, 2022.

2021

  • Clustering of conversational bandits for user preference learning and elicitation [link][slides]
    Junda Wu*, Canzhe Zhao*, Tong Yu, Jingyang Li, and Shuai Li,
    CIKM, 2021.

  • Comparison-based conversational recommender system with relative bandit feedback [link]
    Zhihui Xie, Tong Yu, Canzhe Zhao, and Shuai Li,
    SIGIR, 2021.

Preprints