Research
I have general interest in deep learning, natural language processing, and robust machine
learning. Recently, I focus my research on large language models (LLMs) and LLM-based
agents.
|
|
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang
Preprint. June, 2024
project page /
codes and platform /
paper /
dataset /
benchmark /
model
We propose AgentGym, a framework designed to help the community easily evaluate and develop generally-capable LLM-based agents. It features diverse interactive environments and tasks with a unified format. It supports real-time feedback and concurrency, and is easily scalable.
It also includes a high-quality trajectory set AgentTraj and a benchmark suite AgentEval.
To study the evolution potential of general LLM-based agents, we propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments.
|
|
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, Wei He, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang
ICML 2024
codes
/
paper
In this paper, we propose \(R^3\):
Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL),
a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.
|
|
The Rise and Potential of Large Language Model Based Agents: A Survey
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming
Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Qin
Liu, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan
Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing
Huang, Tao Gui
Preprint. September, 2023
project page
/
paper
In this paper, we provide a comprehensive survey of LLM-based agents with 86
pages.
We start by tracing the concept of agents from its philosophical origins to its development
in AI.
Next, the main body includes the construction of LLM-based agents, its extensive
applications, and the essential concept of Agent society.
Finally, we discuss a range of key topics and open problems within the field, e.g., scaling
number of agentsand Agent-as-a-service.
|
|
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
Zhiheng Xi, Senjie Jin, Yuhao Zhou, Rui Zheng, Songyang Gao, Jia Liu, Tao Gui,
Qi Zhang, Xuanjing Huang
EMNLP 2023 Findings.
codes
/
paper
Different from previous work like Chain-of-Thought (CoT) which enhance LLMs' reasoning
performance from the answer/reasoning side ,
we start from the problem side and propose Self-Polish (SP).
It is a novel method that facilitates the model’s reasoning by guiding it to progressively
refine the given problems to be more comprehensible and solvable.
|
|
Connectivity Patterns are Task Embeddings
Zhiheng Xi, Rui Zheng, Yuansen Zhang, XuanJing Huang, Zhongyu Wei, Minlong
Peng, Mingming Sun, Qi Zhang, Tao Gui
ACL 2023 Findings.
codes
/
paper
In this work, we draw inspiration from the operating mechanism of deep neural networks
(DNNs) and biological brains,
where neuronal activations are sparse and task-specific,
and we use the connectivity patterns of neurons as a unique identifier (Task
Embeddings) associated with the task.
Experiments show that our method consistently outperforms other baselines in predicting
inter-task transferability across data regimes and transfer settings,
while keeping high efficiency in computation and storage.
|
|
Efficient Adversarial Training with Robust Early-Bird Tickets
Zhiheng Xi, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang
EMNLP 2022.
codes
/
paper
Adversarial training, a strong algorithm to enhance model robustness, is typically more
expensive than traditional fine-tuning because of the necessity to generate adversarial
examples via gradient descent.
Delving into the optimization process of adversarial training,
we find that robust connectivity patterns emerge in the early training phase (typically
0.15~0.3 epochs), far before parameters converge.
Inspired by this finding, we dig out robust early-bird tickets (i.e.,
subnetworks) to develop an efficient adversarial training method.
|
|
Characterizing the Impacts of Instances on Robustness
Rui Zheng, Zhiheng Xi (Co-first Author), Qin Liu, Wenbin Lai, Tao Gui, Qi
Zhang, Xuanjing Huang, Jin Ma, Ying Shan, Weifeng Ge
ACL 2023 Findings.
codes
/
paper
In this paper, we show that robust and non-robust instances in the training dataset,
though are both important for test performance, have contrary impacts on robustness,
which makes it possible to build a highly robust model by leveraging the training dataset in a more effective way.
We propose a new method that can distinguish between robust instances from non-robust ones according to the model’s sensitivity
to perturbations on individual instances during training.
Surprisingly, we find that the model under standard training easily overfits the robust instances
by relying on their simple patterns before the model completely learns their robust features.
Finally, we propose a new mitigation algorithm to further release the potential of robust instances.
|
|
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
Rui Zheng, Wei Shen, Yuan Hua, Wenbin Lai, Shihan Dou, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Haoran Huang, Tao Gui, Qi Zhang, Xuanjing Huang
ICLR 2024 (Spotlight).
codes
/
paper
In this work, we propose a novel approach that can learn a consistent policy via RL across various data groups or domains. Given the challenges associated with acquiring group annotations, our method automatically classifies data into different groups, deliberately maximizing performance variance. Then, we optimize the policy to perform well on challenging groups. Lastly, leveraging the established groups, our approach adaptively adjusts the exploration space, allocating more learning capacity to more challenging data and preventing the model from over-optimizing on simpler data.
|
|
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang
Preprint. Mar, 2024.
project page
/
paper
EasyJailbreak is an easy-to-use Python framework designed for researchers and developers focusing on LLM security. Specifically, EasyJailbreak decomposes the mainstream jailbreaking process into several iterable steps:
initialize mutation seeds, select suitable seeds, add constraint, mutate, attack, and evaluate.
On this basis, EasyJailbreak provides a component for each step, constructing a playground for further research and attempts. More details can be found in our paper.
|
|
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang
Preprint. Jan, 2024.
codes
/
paper
In this report, we attempt to address issues of reward modeling in RLHF of LLMs.
(1) From a data perspective, we propose a method to measure the strength of preferences within the data,
based on a voting mechanism of multiple reward models.
(2) From an algorithmic standpoint, we introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses, thereby improving model generalization. Furthermore, we employ meta-learning to enable the reward model to maintain the ability to differentiate subtle differences in out-of-distribution samples.
|
|
Secrets of RLHF in Large Language Models Part I: PPO
Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin,
Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang
Preprint. July, 2023.
codes
/
paper
In this technical report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO,
and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model.
Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT.
|
|
RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions
Yuansen Zhang, Xiao Wang, Zhiheng Xi, Han Xia, Tao Gui, Qi Zhang, Xuanjing Huang
COLING 2024.
paper
In this paper,
we utilize instructions in code style, which are more structural and less ambiguous, to replace typically natural language instructions. Through this conversion, we provide LLMs with more precise instructions and strengthen the robustness of LLMs. Moreover, under few-shot scenarios, we propose a novel method to compose in-context demonstrations using both clean and adversarial samples
({adversarial context method}) to further boost the robustness of the LLMs.
|
Miscellanea
- I'm passionate about FPS games, including Counter-Strike: Global Offensive (CS:GO / CS2)
and CrossFire (CF).
- I love watching soccer and am a big fan of Mourinho.
- I also love watching basketball games and my favorite player is Kevin Durant.
|
|