Zhiheng Xi

I am a second-year master student at Fudan NLP Group of Computer School, Fudan University, where I work on natural language processing (NLP) and deep learning (DL). I am advised by Prof. Tao Gui, Prof. Qi Zhang, and Prof. Xuanjing Huang. Previously, I got my bachelor's degree from Nanjing University, advised by Prof. Jia Liu. I'm honored to be interning at Shanghai AI lab currently. In 2021, I completed a fantastic internship at Microsoft Azure, advised by Jiaye Wu and Hang Zhang.

Email  /  Google Scholar  /  Zhihu  /  Github

profile photo

Research

I have general interest in deep learning, natural language processing, and robust machine learning. Recently, I focus my research on large language models (LLMs) and LLM-based agents.

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, Wei He, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang
ICML 2024
codes / paper

In this paper, we propose \(R^3\): Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.

The Rise and Potential of Large Language Model Based Agents: A Survey
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Qin Liu, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Tao Gui
Preprint. September, 2023
project page / paper

In this paper, we provide a comprehensive survey of LLM-based agents with 86 pages. We start by tracing the concept of agents from its philosophical origins to its development in AI. Next, the main body includes the construction of LLM-based agents, its extensive applications, and the essential concept of Agent society. Finally, we discuss a range of key topics and open problems within the field, e.g., scaling number of agentsand Agent-as-a-service.

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
Zhiheng Xi, Senjie Jin, Yuhao Zhou, Rui Zheng, Songyang Gao, Jia Liu, Tao Gui, Qi Zhang, Xuanjing Huang
EMNLP 2023 Findings.
codes / paper

Different from previous work like Chain-of-Thought (CoT) which enhance LLMs' reasoning performance from the answer/reasoning side , we start from the problem side and propose Self-Polish (SP). It is a novel method that facilitates the model’s reasoning by guiding it to progressively refine the given problems to be more comprehensible and solvable.

Connectivity Patterns are Task Embeddings
Zhiheng Xi, Rui Zheng, Yuansen Zhang, XuanJing Huang, Zhongyu Wei, Minlong Peng, Mingming Sun, Qi Zhang, Tao Gui
ACL 2023 Findings.
codes / paper

In this work, we draw inspiration from the operating mechanism of deep neural networks (DNNs) and biological brains, where neuronal activations are sparse and task-specific, and we use the connectivity patterns of neurons as a unique identifier (Task Embeddings) associated with the task. Experiments show that our method consistently outperforms other baselines in predicting inter-task transferability across data regimes and transfer settings, while keeping high efficiency in computation and storage.

Efficient Adversarial Training with Robust Early-Bird Tickets
Zhiheng Xi, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang
EMNLP 2022.
codes / paper

Adversarial training, a strong algorithm to enhance model robustness, is typically more expensive than traditional fine-tuning because of the necessity to generate adversarial examples via gradient descent. Delving into the optimization process of adversarial training, we find that robust connectivity patterns emerge in the early training phase (typically 0.15~0.3 epochs), far before parameters converge. Inspired by this finding, we dig out robust early-bird tickets (i.e., subnetworks) to develop an efficient adversarial training method.

Characterizing the Impacts of Instances on Robustness
Rui Zheng, Zhiheng Xi (Co-first Author), Qin Liu, Wenbin Lai, Tao Gui, Qi Zhang, Xuanjing Huang, Jin Ma, Ying Shan, Weifeng Ge
ACL 2023 Findings.
codes / paper

In this paper, we show that robust and non-robust instances in the training dataset, though are both important for test performance, have contrary impacts on robustness, which makes it possible to build a highly robust model by leveraging the training dataset in a more effective way. We propose a new method that can distinguish between robust instances from non-robust ones according to the model’s sensitivity to perturbations on individual instances during training. Surprisingly, we find that the model under standard training easily overfits the robust instances by relying on their simple patterns before the model completely learns their robust features. Finally, we propose a new mitigation algorithm to further release the potential of robust instances.

Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
Rui Zheng, Wei Shen, Yuan Hua, Wenbin Lai, Shihan Dou, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Haoran Huang, Tao Gui, Qi Zhang, Xuanjing Huang
ICLR 2024 (Spotlight).
codes / paper

In this work, we propose a novel approach that can learn a consistent policy via RL across various data groups or domains. Given the challenges associated with acquiring group annotations, our method automatically classifies data into different groups, deliberately maximizing performance variance. Then, we optimize the policy to perform well on challenging groups. Lastly, leveraging the established groups, our approach adaptively adjusts the exploration space, allocating more learning capacity to more challenging data and preventing the model from over-optimizing on simpler data.

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang
Preprint. Mar, 2024.
project page / paper

EasyJailbreak is an easy-to-use Python framework designed for researchers and developers focusing on LLM security. Specifically, EasyJailbreak decomposes the mainstream jailbreaking process into several iterable steps: initialize mutation seeds, select suitable seeds, add constraint, mutate, attack, and evaluate. On this basis, EasyJailbreak provides a component for each step, constructing a playground for further research and attempts. More details can be found in our paper.

Secrets of RLHF in Large Language Models Part II: Reward Modeling
Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang
Preprint. Jan, 2024.
codes / paper

In this report, we attempt to address issues of reward modeling in RLHF of LLMs. (1) From a data perspective, we propose a method to measure the strength of preferences within the data, based on a voting mechanism of multiple reward models. (2) From an algorithmic standpoint, we introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses, thereby improving model generalization. Furthermore, we employ meta-learning to enable the reward model to maintain the ability to differentiate subtle differences in out-of-distribution samples.

Secrets of RLHF in Large Language Models Part I: PPO
Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang
Preprint. July, 2023.
codes / paper

In this technical report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT.

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions
Yuansen Zhang, Xiao Wang, Zhiheng Xi, Han Xia, Tao Gui, Qi Zhang, Xuanjing Huang
COLING 2024.
paper

In this paper, we utilize instructions in code style, which are more structural and less ambiguous, to replace typically natural language instructions. Through this conversion, we provide LLMs with more precise instructions and strengthen the robustness of LLMs. Moreover, under few-shot scenarios, we propose a novel method to compose in-context demonstrations using both clean and adversarial samples ({adversarial context method}) to further boost the robustness of the LLMs.

Miscellanea

  • I'm passionate about FPS games, including Counter-Strike: Global Offensive (CS:GO / CS2) and CrossFire (CF).
  • I love watching soccer and am a big fan of Mourinho.
  • I also love watching basketball games and my favorite player is Kevin Durant.



Updated at May 2024
Thanks Jon Barron for this fantastic template!