Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Code for NAACL 2025 paper "Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data"

Setup

Remember to use the newest version of vLLM
Might need to install azure cli if things don't work: pip install azure-cli azure-functions azure-identity
Dependency: https://github.com/microsoft/controllable-safety-alignment

Make sure to correctly setup environment variables before running all scripts:

PROJ_DIR and PYTHONPATH should be the project directory of controllable-safety-alignment repo (instead of this one)!
MODEL_DIR, DATA_DIR: dedicated directory to store downloaded models and data
OUTPUT_DIR: dedicated directory to store trained checkpoints

Steps for running the Quote-Tuning pipeline

Format data into huggingface dataset

Example: data/nq/dev, data/nq/train

Dataset({
    features: ['prompt', 'reference'],
    num_rows: 110865
})

Start vLLM server by using start_vllm.sh (make sure port 8000 is not in use or change to a different port! Chaning port requires modifying the last few lines of model_name_to_endpoints function in $PROJ_DIR/src/oai_inference.py) and run $PROJ_DIR/src/oai_inference.py to generate responses on training and dev data.

Example: run_gen_bo32.sh

Setup quip score server (or use the existing one accesible on internet, 'https://acc2-private-wiki.dataportraits.org/quip'). Make sure the url in quip_api.py is correct.
Use run_metric_on_gen.py to score responses with quip score. See command line arguments there for details.
Use best_of_n_to_paired_gen.py to produce paired data for DPO. See examples in best_of_n_to_paired_gen.sh. Next, convert the produced .json into huggingface dataset via data_processing/convert_paired_gens_json_to_dataset.py. Example available in data_processing/convert_paired_gens_json_to_dataset.sh
Add path of the paired data (converted to HF dataset) as a dataset in the PAIRED_DATA_DICT of $PROJ_DIR/dpo/preference_datasets.py. Search for 'qt_gemma2-9b-it-inst_bo32_dq0.10_dl0.10-concise_sysp' in the file for example.
Conduct DPO training! Use $PROJ_DIR/dpo/train_qt.sh. This part of code is based on https://github.com/eric-mitchell/direct-preference-optimization. WANDB integration is supported. After training, Convert the trained model .pt file back to huggingface format using checkpoint_pt_to_hf.py.
Run evaluation using run_eval_combined.sh.

Reference

If you find our work useful, we kindly invite you to cite it:

@misc{zhang2025verifiabledesignaligninglanguage,
      title={Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data}, 
      author={Jingyu Zhang and Marc Marone and Tianjian Li and Benjamin Van Durme and Daniel Khashabi},
      year={2025},
      eprint={2404.03862},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2404.03862}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/nq		data/nq
prompt_templates		prompt_templates
README.md		README.md
best_of_n_to_paired_gen.py		best_of_n_to_paired_gen.py
best_of_n_to_paired_gen.sh		best_of_n_to_paired_gen.sh
checkpoint_pt_to_hf.py		checkpoint_pt_to_hf.py
checkpoint_pt_to_hf.sh		checkpoint_pt_to_hf.sh
cond_perplexity.py		cond_perplexity.py
dpo_requirements.txt		dpo_requirements.txt
evaluation.py		evaluation.py
evaluation.sh		evaluation.sh
jack_utils.py		jack_utils.py
preference_datasets.py		preference_datasets.py
quip_api.py		quip_api.py
run_eval_combined.sh		run_eval_combined.sh
run_gen_bo32.sh		run_gen_bo32.sh
run_metric_on_gen.py		run_metric_on_gen.py
run_metric_on_gen.sh		run_metric_on_gen.sh
run_quip_on_gen.py		run_quip_on_gen.py
start_vllm.sh		start_vllm.sh
train.py		train.py
train.sh		train.sh
train_nobs.sh		train_nobs.sh
train_sft.sh		train_sft.sh
train_wiki.sh		train_wiki.sh
trainers.py		trainers.py
utils.py		utils.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Setup

Steps for running the Quote-Tuning pipeline

Reference

About

Releases

Packages

Languages

JHU-CLSP/verifiable-by-design

Folders and files

Latest commit

History

Repository files navigation

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Setup

Steps for running the Quote-Tuning pipeline

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages