Currently, I am a Ph.D. candidate in the School of Computer Science at Peking University , advised by Prof. Zongqing Lu.
I received my M.Sc. degree from the Department of Computer Science and Technology at Tsinghua University in June 2022, advised by Prof. Xi Xiao. I received my B.Sc. degree from the School of Mathematical Sciences at Nankai University in June 2019. I also worked as a research intern at Tencent AI Lab in 2021, advised by Dijun Luo.
My research interests include Reinforcement Learning, Language Modeling. Feel free to contact me if you are interested in discussing or collaborating.
[Email / Google Scholar / DBLP / Github]

Publications

  • AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback. (NAACL’24)
  • Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning. (ICML’23)
    • Ziluo Ding*, Wanpeng Zhang*, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu. *equal contribution
    • Link / arXiv / PDF / Code
  • Model-Based Opponent Modeling. (NeurIPS’22)
    • Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu.
    • Link / arXiv / PDF / Code
  • iGrow: A Smart Agriculture Solution to Autonomous Greenhouse Control. (AAAI’22)
    • Xiaoyan Cao, Yao Yao, Lanqing Li, Wanpeng Zhang, Zhicheng An, Zhong Zhang, Shihui Guo, Li Xiao, Xiaoyu Cao, Dijun Luo.
    • Link / arXiv / PDF / Code
  • Efficient and Stable Information Directed Exploration for Continuous Reinforcement Learning. (ICASSP’22)
    • Mingzhe Chen, Xi Xiao, Wanpeng Zhang, Xiaotian Gao.
    • Link / PDF
  • Robust Model-based Reinforcement Learning for Autonomous Greenhouse Control. (ACML’21)
    • Wanpeng Zhang, Xiaoyan Cao, Yao Yao, Zhicheng An, Dijun Luo, Xi Xiao.
    • Link / arXiv / PDF
  • Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation. (ICRA’21)
    • Yao Yao, Li Xiao, Zhicheng An, Wanpeng Zhang, Dijun Luo.
    • Link / arXiv / PDF / Code
  • A Simulator-based Planning Framework for Optimizing Autonomous Greenhouse Control Strategy. (ICAPS’21)
    • Zhicheng An, Xiaoyan Cao, Yao Yao, Wanpeng Zhang, Lanqing Li, Yue Wang, Shihui Guo, Dijun Luo.
    • Link / PDF
  • Self-Paced Probabilistic Principal Component Analysis for Data with Outliers. (ICASSP’20)
    • Bowen Zhao, Xi Xiao, Wanpeng Zhang, Bin Zhang, Guojun Gan, Shutao Xia.
    • Link / arXiv / PDF / Code

Preprints

  • Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation. (arXiv’23.06)
    • Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu.
    • arXiv / PDF
  • MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning. (arXiv’21.08)
    • Wanpeng Zhang, Xi Xiao, Yao Yao, Mingzhe Chen, Dijun Luo.
    • arXiv / PDF

Patents

  • Method, device and equipment for determining parameters and storage medium. (CN112527104A)
    • Wanpeng Zhang, Dijun Luo, Xi Xiao
    • Link / PDF

Academic Service

  • Reviewer
    • ICML 2022, 2023, 2024
    • NeurIPS 2022, 2023
    • ICLR 2024