Research
                        
                            I have general interest in deep learning and natural language processing. Recently, I focus my research on LLM agents, RL for LLM reasoning, and scalable oversight.
                            
                         
                     | 
                 
                
             
            
                
                                        
                        
                            
                                BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
                            
                             
                            Zhiheng Xi, Xin Guo, Yang Nan, Enyu Zhou, Junrui Shen, Wenxiang Chen, Jiaqi Liu, Jixuan Huang, Zhihao Zhang, Honglin Guo, Xun Deng, Zhikai Lei, Miao Zheng, Guoteng Wang, Shuo Zhang, Peng Sun, Rui Zheng, Hang Yan, Tao Gui, Qi Zhang, Xuanjing Huang
                             
                            Preprint. Oct, 2025
    
                             
                            
                            codes /
                            paper 
                            
    
                            
    
    
                            
                            
                                
    
                            
                            
                         | 
                     
                    
                        
                        
                            
                                AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
                            
                             
                            Zhiheng Xi, Jixuan Huang, Chenyang Liao, Baodai Huang, Honglin Guo, Jiaqi Liu, Rui Zheng, Junjie Ye, Jiazheng Zhang, Wenxiang Chen, Wei He, Yiwen Ding, Guanyu Li, Zehui Chen, Zhengyin Du, Xuesong Yao, Yufei Xu, Jiecao Chen, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang
                             
                            Preprint. Sep, 2025
    
                             
                            project page /
                            codes /
                            paper 
                            
    
                            
    
    
                            
                            
                                
    
                            
                            
                         | 
                     
                
                    
                    
                        
                            AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress
                        
                         
                        Zhiheng Xi, Chenyang Liao, Guanyu Li, Zhihao Zhang, Wenxiang Chen, Binghai Wang, Senjie Jin, Yuhao Zhou, Jian Guan, Wei Wu, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang
                         
                        Preprint. Jan, 2025
                         
                        
                        
                        
                        
                            
                        
                        
                     | 
                 
                    
                    
                    
                        
                            AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
                        
                         
                        Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang
                         
                        ACL 2025. Preprint at June, 2024.
                         
                        project page /
                        codes and platform /
                        paper /
                        dataset /
                        benchmark /
                        model
                        
                        
                        
                            
                            
                        
                        
                            
                        
                        
                     | 
                 
                
                    
                    
                        
                            The Rise and Potential of Large Language Model Based Agents: A Survey
                        
                         
                        Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming
                        Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Qin
                        Liu, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan
                        Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing
                        Huang, Tao Gui
                         
                        
                        
                        SCIENCE CHINA Information Sciences (SCIS), Cover Paper of SCIS Volume 68, Number 2, February 2025. 
                        
                         
                        project page
                        /
                        
                        paper
                        
                        
                            
                            
                            
                            
                            
                            
                            
                            
                            
                            
                            
                        
                     | 
                 
                    
                        
                        
                            
                                BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
                            
                             
                            Zhiheng Xi, Guanyu Li, Yutao Fan, Honglin Guo, Yufang Liu, Xiaoran Fan, Jiaqi Liu, Jingchao Ding, Wangmeng Zuo, Zhenfei Yin, Lei Bai, Tao Ji, Tao Gui, Qi Zhang, Philip Torr, Xuanjing Huang
                             
                            NeurIPS 2025.
    
                             
                            project page /
                            codes /
                            paper /
                            dataset 
    
                            
    
    
                            
                            
                                
    
                            
                            
                         | 
                     
                    
                        
                        
                            
                                Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
                            
                             
                            Zhiheng Xi, Jixuan Huang, Xin Guo, Boyang Hong, Dingwen Yang, Xiaoran Fan, Shuo Li, Zehui Chen, Junjie Ye, Siyu Yuan, Zhengyin Du, Xuesong Yao, Yufei Xu, Jiecao Chen, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang
                             
                            Preprint. Feb, 2025
    
                             
                            
                            codes /
                            paper 
                            
    
                            
    
    
                            
                            
                                
    
                            
                            
                         | 
                     
                    
                        
                        
                            
                                MathCritique: Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
                            
                             
                            Zhiheng Xi, Dingwen Yang, Jixuan Huang, Jiafu Tang, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, Shihan Do, Wenyu Zhan, Xiao Wang, Rui Zheng, Tao Ji, Xiaowei Shi, Yitao Zhai, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Zuxuan Wu, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Yu-Gang Jiang
                             
                            Preprint. Nov, 2024
    
                             
                            project page /
                            codes /
                            paper /
                            dataset 
    
                            
    
    
                            
                            
                                
    
                            
                            
                         | 
                     
                    
                    
                    
                        
                            Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
                        
                         
                        Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, Wei He, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang
                         
                        
                        
                        ICML 2024; CIPS-LMG 2024 Outstanding Poster
                         
                        codes
                        /
                        
                        paper
                        
                        
                     | 
                 
                    
                        
                        
                            
                                Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
                            
                             
                            Yiwen Ding*, Zhiheng Xi* (Co-first Author), Wei He, Zhuoyuan Li, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang        
                            NAACL 2024
    
                             
                            
                            codes /
                            paper 
                            
    
                            
    
    
                            
                            
                                
    
                            
                            
                         | 
                     
                    
                        
                        
                            
                                Distill Visual Chart Reasoning Ability from LLMs to MLLMs
                                
                            
                             
                            Wei He*, Zhiheng Xi* (Co-first Author), Wanxu Zhao*, Xiaoran Fan, Yiwen Ding, Zifei Shan, Tao Gui, Qi Zhang, Xuanjing Huang                           
                            EMNLP 2025.
    
                             
                            
                            codes /
                            paper /
                            dataset 
    
                            
    
    
                            
                            
                                
    
                            
                            
                         | 
                     
                
                    
                    
                        
                            RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
                            
                        
                         
                        Enyu Zhou*, Guodong Zheng*, Binghai Wang*, Zhiheng Xi, Shihan Dou, Rong Bao, Wei Shen, Limao Xiong, Jessica Fan, Yurong Mou, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang                           
                        
                         
                        ICLR 2025
                         
                        
                        codes /
                        paper 
                        
                        
                        
                        
                            
                        
                        
                     | 
                 
                
                
                    
                    
                        
                            Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
                        
                         
                        Zhiheng Xi, Senjie Jin, Yuhao Zhou, Rui Zheng, Songyang Gao, Jia Liu, Tao Gui,
                        Qi Zhang, Xuanjing Huang
                         
                        EMNLP 2023 Findings.
                        
                         
                        codes
                        /
                        
                        paper
                        
                        
                            
                            
                            
                            
                            
                        
                     | 
                 
                
                    
                    
                        
                            Connectivity Patterns are Task Embeddings
                        
                         
                        Zhiheng Xi, Rui Zheng, Yuansen Zhang, XuanJing Huang, Zhongyu Wei, Minlong
                        Peng, Mingming Sun, Qi Zhang, Tao Gui
                         
                        ACL 2023 Findings.
                        
                         
                        codes
                        /
                        
                        paper
                        
                        
                            
                            
                            
                            
                            
                            
                            
                            
                        
                     | 
                 
                
                    
                    
                        
                            Efficient Adversarial Training with Robust Early-Bird Tickets
                        
                         
                        Zhiheng Xi, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang
                         
                        EMNLP 2022.
                        
                         
                        codes
                        /
                        
                        paper
                        
                        
                            
                            
                            
                            
                            
                            
                            
                            
                        
                     | 
                 
                
                    
                    
                        
                            Characterizing the Impacts of Instances on Robustness
                        
                         
                        Rui Zheng*, Zhiheng Xi* (Co-first Author), Qin Liu, Wenbin Lai, Tao Gui, Qi
                        Zhang, Xuanjing Huang, Jin Ma, Ying Shan, Weifeng Ge
                         
                        ACL 2023 Findings.
                        
                         
                        codes
                        /
                        
                        paper
                        
                     | 
                 
                
                    
                    
                        
                            Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
                        
                         
                        Rui Zheng, Wei Shen, Yuan Hua, Wenbin Lai, Shihan Dou, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Haoran Huang, Tao Gui, Qi Zhang, Xuanjing Huang
                         
                        ICLR 2024 (Spotlight).
                         
                        codes
                        /
                        paper
                        
                     | 
             
                
                    
                    
                        
                            EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
                        
                         
                        Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang
                         
                        Preprint. Mar, 2024.
                         
                        project page
                        /
                        paper
                        
                     | 
             
                
                    
                    
                        
                            Secrets of RLHF in Large Language Models Part II: Reward Modeling
                        
                         
                        Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang
                         
                        Preprint. Jan, 2024.
                         
                        codes
                        /
                        paper
                        
                     | 
                 
                
                    
                    
                        
                            Secrets of RLHF in Large Language Models Part I: PPO
                        
                         
                        Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin,
                        Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang
                         
                        Preprint. July, 2023.
                         
                        codes
                        /
                        paper
                        
                     | 
                 
                
                    
                    
                        
                            RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions
                        
                         
                        Yuansen Zhang, Xiao Wang, Zhiheng Xi, Han Xia, Tao Gui, Qi Zhang, Xuanjing Huang
                         
                        COLING 2024.
                         
                        paper
                        
                     | 
             
                
                    
                    
                        
                            
                                                  Miscellanea
                            
                                -  I'm passionate about FPS games, including Counter-Strike: Global Offensive (CS:GO / CS2)
                                    and CrossFire (CF).
                                
 
                                -  I love watching soccer and am a big fan of Mourinho.
 
                                -  I also love watching basketball games and my favorite player is Kevin Durant.
 
                             
                         | 
                     
                    
                 
                
  |