Skip to content

ZNLP/Language-Imbalance-Driven-Rewarding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Imbalance Driven Rewarding for Multilingual Self-improving


Wen Yang1,2*, Junhong Wu1,2*, Chen Wang1,2, Chengqing Zong1,2, Jiajun Zhang1,2,3,4🌟,

* Equal contribution 🌟 Corresponding author

1 School of Artificial Intelligence, University of Chinese Academy of Sciences
2 Institute of Automation, Chinese Academy of Sciences
3 Wuhan AI Research 4 Shanghai Artificial Intelligence Laboratory, Shanghai, China

Multilingual-Self-Improving

Overview

We introduce Language Imbalance Driven Rewarding, a novel approach that leverages the inherent capability imbalance across different languages in large language models (LLMs) as a reward signal for iterative self-improvement. By applying iterative DPO training, our approach not only enhances the performance of non-dominant languages but also improves outcomes in dominant languages.

Our goal with this approach is to contribute a new perspective to the multilingual LLM community by challenging the assumption that language imbalance is solely a challenge to be mitigated. We hope this approach will inspire further exploration into multilingual self-improvement in LLMs, broadening the horizon for more balanced and capable language models.

🔥 Update

  • [28/10/2024]🔥We release the code for Language Imbalance Driven Rewarding!
  • [11/10/2024]🔥Language Imbalance Driven Rewarding is coming! We release the paper!

👀 Contents

📷 Setup

Please follow the instructions below to install the required packages.

  1. Clone this repository
https://github.com/ZNLP/Language-Imbalance-Driven-Rewarding.git
  1. Install Package
conda create -n mdpo python=3.10 -y
conda activate mdpo
cd Language-Imbalance-Driven-Rewarding
pip install -r requirements.txt

💡 Preparation

bash ./scripts/batch_inference.sh
bash ./scripts/batch_translate.sh

📈 Train

Our training is mostly performed on LLaMA-Factory code base. Please refer to that repo for more details.

📈 Evaluation

bash scripts/batch_inference_for_eval.sh

👀 Experiments

We provide some results in this section. More detailed results can be found in our paper.

General Instruction Following

  • Head-to-head Performance
  • X-alpacaEval
Click to expand more examples

The Multilingual MT-Bench Benchmark

The Multilingual NLP Benchmarks

Arithmetic Reasoning

  • Performances on MGSM benchmark

Schedule

  • Release training & evaluation code

  • Release GPT-4 Score code

Citation

If you find this repo useful for your research, please consider citing the paper

@article{yang2024language,
  title={Language Imbalance Driven Rewarding for Multilingual Self-improving},
  author={Yang, Wen and Wu, Junhong and Wang, Chen and Zong, Chengqing and Zhang, Jiajun},
  journal={arXiv preprint arXiv:2410.08964},
  year={2024}
}

Acknowledgement

We would like to thank the following repos for their great work:

License

This project is released under the Apache 2.0 license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.

About

[ICLR 2025] Language Imbalance Driven Rewarding for Multilingual Self-improving

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •