I am a Ph.D. candidate in the School of Computer Science at Peking University , advised by Prof. Zongqing Lu. My research interests include Multimodal LLMs, Reinforcement Learning, and Embodied Agent. Feel free to contact me if you are interested in discussing or collaborating. For more details, please refer to my CV or CV(Chinese).

/ / / /

Education

  • Peking University. School of Computer Science.
    • Ph.D. Candidate. (Sep. 2022 — Present)
    • Supervisor: Prof. Zongqing Lu.
  • Tsinghua University. Department of Computer Science and Technology.
    • Master of Science Degree. (Sep. 2019 — Jun. 2022)
    • Supervisor: Prof. Xi Xiao.
  • Nankai University. School of Mathematical Sciences.
    • Bachelor of Science Degree. (Sep. 2015 — Jun. 2019)
    • Advisor: Prof. Jishou Ruan.

Experience

  • BeingBeyond
    • Research Intern. (Mar. 2025 — Present)
    • Multimodal LLMs / Embodied Agent
  • Beijing Academy of Artificial Intelligence (BAAI)
    • Research Intern. (May. 2024 — Mar.2025)
    • Multimodal LLMs / Embodied Agent
  • Tencent AI Lab
    • Research Intern. (Jun. 2020 — Jul. 2021)
    • Reinforcement Learning / AI for Science.

Selected Publication

(For the full publications, please see my Google Scholar.)

1. MLLM

  • (ICCV’25) Unified Multimodal Understanding via Byte-Pair Visual Encoding.
    • Wanpeng Zhang, Yicheng Feng, Hao Luo, Yijiang Li, Zihao Yue, Sipeng Zheng, Zongqing Lu.
    • TLDR: Building upon the visual BPE Tokenizer proposed in the previous work, we further designed a complete training framework and our Being-VL-0.5 model.
    • Link / PDF / Bib
  • (ICCV’25) VideoOrion: Tokenizing Object Dynamics in Videos.
    • Yicheng Feng, Yijiang Li, Wanpeng Zhang, Hao Luo, Zihao Yue, Sipeng Zheng, Zongqing Lu.
    • TLDR: VideoOrion encodes videos with a two-branch design, using object tokens from a detect-segment-track pipeline to capture object dynamics alongside scene context.
    • Link / PDF / Bib
  • (ICLR’25) From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities.
    • Wanpeng Zhang, Zilong Xie, Yicheng Feng, Yijiang Li, Xingrun Xing, Sipeng Zheng, Zongqing Lu.
    • TLDR: We propose BPE Tokenizer for images, enabling Transformers to learn and align multi-modal information more effectively, providing a new learning paradigm for Unified MLLMs.
    • Link / PDF / Bib

2. RL & Agent

  • (NAACL’25) LLM-Based Explicit Models of Opponents for Multi-Agent Games.
    • Xiaopeng Yu, Wanpeng Zhang, Zongqing Lu.
    • TLDR: We propose EMO, a method that models each opponent individually using LLMs with iterative self- and global-refinement for better multi-agent reasoning.
    • Link / PDF / Bib
  • (ICML’24) Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation.
    • Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu.
    • TLDR: By adaptively learning the causal relationship joint graph in the environment and providing representations with causal relationships, RL algorithms can effectively tackle non-stationarities.
    • Link / PDF / Bib / Code
  • (NAACL’24) AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback.
    • Wanpeng Zhang, Zongqing Lu.
    • TLDR: We propose AdaRefiner to achieve the co-learning of LLMs and RL agents by enabling them to provide feedback to each other, optimizing both perception and decision-making capabilities.
    • Link / PDF / Bib / Code
  • (ICML’23) Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning.
    • Ziluo Ding*, Wanpeng Zhang*, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu. *equal contribution
    • TLDR: We propose EnDi framework, achieving agent goal division and collaboration enhancement in multi-agent systems through language and entity binding.
    • Link / PDF / Bib / Code
  • (NeurIPS’22) Model-Based Opponent Modeling.
    • Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu.
    • TLDR: MBOM uses environment models to recursively simulate and mix imagined opponent policies for adaptive opponent modeling.
    • Link / PDF / Bib / Code

Patent

  • Multimodal data processing method, device, storage medium, and electronic equipment. (CN119226992A)
    • Zongqing Lu, Wanpeng Zhang.
    • Link / PDF
  • Method, device and equipment for determining parameters and storage medium. (CN112527104A)
    • Wanpeng Zhang, Dijun Luo, Xi Xiao.
    • Link / PDF

Award

  • Award for Scientific Research of Peking University. (Dec. 2024)
  • Presidential Scholarship of Peking University. (Nov. 2024)
  • Rhino-bird Elite Training Program of Tencent AI Lab. (Jul. 2021)
  • Mathematical Contest in Modeling (MCM/ICM), Meritorious Winner (First Prize). (Apr. 2017)
  • China Undergraduate Mathematical Contest in Modeling (CUMCM), Second Prize. (Jan. 2016)

Service

  • Conference Reviewer
    • ICML / NeurIPS / ICLR / ICCV / AAAI / ICRA / AISTATS
  • Journal Reviewer
    • TNNLS / TIST
  • Teaching Assistant
    • Deep Reinforcement Learning, Peking University. (Spring, 2025)