Currently, I am a Ph.D. candidate in the School of Computer Science at Peking University , advised by Prof. Zongqing Lu. My research interests include Reinforcement Learning, Generative Modeling, Multimodal LLMs. Feel free to contact me if you are interested in discussing or collaborating.
[Email / Google Scholar / DBLP / Github / CV]

Education

  • School of Computer Science, Peking University.
    • Ph.D. Candidate. Advised by Prof. Zongqing Lu.
    • 2022 — Now
  • Department of Computer Science and Technology, Tsinghua University.
    • M.Sc. Degree. Advised by Prof. Xi Xiao.
    • 2019 — 2022
  • School of Mathematical Sciences, Nankai University.
    • B.Sc. Degree.
    • 2015 — 2019

Publication

  • Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation. (ICML’24)
    • Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu.
    • Link / PDF / BibTeX
  • AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback. (NAACL’24)
  • Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning. (ICML’23)
    • Ziluo Ding*, Wanpeng Zhang*, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu. *equal contribution
    • Link / PDF / BibTeX / Code / Talk
  • Model-Based Opponent Modeling. (NeurIPS’22)
    • Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu.
    • Link / PDF / BibTeX / Code / Talk
  • iGrow: A Smart Agriculture Solution to Autonomous Greenhouse Control. (AAAI’22)
    • Xiaoyan Cao, Yao Yao, Lanqing Li, Wanpeng Zhang, Zhicheng An, Zhong Zhang, Shihui Guo, Li Xiao, Xiaoyu Cao, Dijun Luo.
    • Link / PDF / BibTeX / Code
  • Efficient and Stable Information Directed Exploration for Continuous Reinforcement Learning. (ICASSP’22)
    • Mingzhe Chen, Xi Xiao, Wanpeng Zhang, Xiaotian Gao.
    • Link / PDF / BibTeX
  • Robust Model-based Reinforcement Learning for Autonomous Greenhouse Control. (ACML’21)
    • Wanpeng Zhang, Xiaoyan Cao, Yao Yao, Zhicheng An, Dijun Luo, Xi Xiao.
    • Link / PDF / BibTeX / Talk
  • Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation. (ICRA’21)
    • Yao Yao, Li Xiao, Zhicheng An, Wanpeng Zhang, Dijun Luo.
    • Link / PDF / BibTeX / Code
  • A Simulator-based Planning Framework for Optimizing Autonomous Greenhouse Control Strategy. (ICAPS’21)
    • Zhicheng An, Xiaoyan Cao, Yao Yao, Wanpeng Zhang, Lanqing Li, Yue Wang, Shihui Guo, Dijun Luo.
    • Link / PDF / BibTeX
  • Self-Paced Probabilistic Principal Component Analysis for Data with Outliers. (ICASSP’20)
    • Bowen Zhao, Xi Xiao, Wanpeng Zhang, Bin Zhang, Guojun Gan, Shutao Xia.
    • Link / PDF / BibTeX / Code

Preprint

  • MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning. (arXiv’21.08)
    • Wanpeng Zhang, Xi Xiao, Yao Yao, Mingzhe Chen, Dijun Luo.
    • Link / PDF / BibTeX

Patent

  • Method, device and equipment for determining parameters and storage medium. (CN112527104A)
    • Wanpeng Zhang, Dijun Luo, Xi Xiao
    • Link / PDF

Experience

  • Beijing Academy of Artificial Intelligence (BAAI)
    • Research Intern
    • 2024.05 — Now
  • Tencent AI Lab
    • Research Intern
    • 2020.06 — 2021.07
  • Availink
    • Research Intern
    • 2018.08 — 2018.10

Academic Service

  • Conference Reviewer
    • ICML 2022, 2023, 2024
    • NeurIPS 2022, 2023, 2024
    • ICLR 2024