Language Imbalance Driven Rewarding for Multilingual Self-improving

Wen Yang^1,2*, Junhong Wu^1,2*, Chen Wang^1,2, Chengqing Zong^1,2, Jiajun Zhang^1,2,3,4🌟,

* Equal contribution 🌟 Corresponding author

¹ School of Artificial Intelligence, University of Chinese Academy of Sciences
² Institute of Automation, Chinese Academy of Sciences
³ Wuhan AI Research ⁴ Shanghai Artificial Intelligence Laboratory, Shanghai, China

[📖 arXiv Paper]

Overview

We introduce Language Imbalance Driven Rewarding, a novel approach that leverages the inherent capability imbalance across different languages in large language models (LLMs) as a reward signal for iterative self-improvement. By applying iterative DPO training, our approach not only enhances the performance of non-dominant languages but also improves outcomes in dominant languages.

Our goal with this approach is to contribute a new perspective to the multilingual LLM community by challenging the assumption that language imbalance is solely a challenge to be mitigated. We hope this approach will inspire further exploration into multilingual self-improvement in LLMs, broadening the horizon for more balanced and capable language models.

🔥 Update

[28/10/2024]🔥We release the code for Language Imbalance Driven Rewarding!
[11/10/2024]🔥Language Imbalance Driven Rewarding is coming! We release the paper!

📷 Setup

Please follow the instructions below to install the required packages.

Clone this repository

https://github.com/ZNLP/Language-Imbalance-Driven-Rewarding.git

Install Package

conda create -n mdpo python=3.10 -y
conda activate mdpo
cd Language-Imbalance-Driven-Rewarding
pip install -r requirements.txt

💡 Preparation

bash ./scripts/batch_inference.sh

bash ./scripts/batch_translate.sh

📈 Train

Our training is mostly performed on LLaMA-Factory code base. Please refer to that repo for more details.

📈 Evaluation

bash scripts/batch_inference_for_eval.sh

👀 Experiments

We provide some results in this section. More detailed results can be found in our paper.

General Instruction Following

Head-to-head Performance

X-alpacaEval

Click to expand more examples

The Multilingual MT-Bench Benchmark

The Multilingual NLP Benchmarks

Arithmetic Reasoning

Performances on MGSM benchmark

Schedule

Release training & evaluation code
Release GPT-4 Score code

Citation

If you find this repo useful for your research, please consider citing the paper

@article{yang2024language,
  title={Language Imbalance Driven Rewarding for Multilingual Self-improving},
  author={Yang, Wen and Wu, Junhong and Wang, Chen and Zong, Chengqing and Zhang, Jiajun},
  journal={arXiv preprint arXiv:2410.08964},
  year={2024}
}

Acknowledgement

We would like to thank the following repos for their great work:

This work utilizes the great work from LLaMA-Factory, Vllm, transformers, LLaMA, Qwen2

License

This project is released under the Apache 2.0 license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
data		data
scripts		scripts
utils		utils
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Imbalance Driven Rewarding for Multilingual Self-improving

Overview

🔥 Update

👀 Contents

📷 Setup

💡 Preparation

📈 Train

📈 Evaluation

👀 Experiments

General Instruction Following

Arithmetic Reasoning

Schedule

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ZNLP/Language-Imbalance-Driven-Rewarding

Folders and files

Latest commit

History

Repository files navigation

Language Imbalance Driven Rewarding for Multilingual Self-improving

Overview

🔥 Update

👀 Contents

📷 Setup

💡 Preparation

📈 Train

📈 Evaluation

👀 Experiments

General Instruction Following

Arithmetic Reasoning

Schedule

Citation

Acknowledgement

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages