Skip to content

GRPO with a lora model after SFT #588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Pandasea opened this issue Apr 9, 2025 · 0 comments
Open

GRPO with a lora model after SFT #588

Pandasea opened this issue Apr 9, 2025 · 0 comments

Comments

@Pandasea
Copy link

Pandasea commented Apr 9, 2025

I want to use GRPO to train a lora model after SFT, but the reward does not look normal. Is the model loaded correctly in this way?

grpo.py
#############################

Initialize the GRPO trainer

#############################
if script_args.load_in_lora:
base_model = AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
torch_dtype = torch_dtype,
trust_remote_code=model_args.trust_remote_code,
attn_implementation=model_args.attn_implementation,
use_cache=False if training_args.gradient_checkpointing else True
)
model = PeftModel.from_pretrained(base_model, model_id=script_args.lora_dir, is_trainable=True)
model_in = model
else:
model_kwargs = dict(
revision=model_args.model_revision,
trust_remote_code=model_args.trust_remote_code,
attn_implementation=model_args.attn_implementation,
torch_dtype=torch_dtype,
use_cache=False if training_args.gradient_checkpointing else True,
)
training_args.model_init_kwargs = model_kwargs
model_in = model_args.model_name_or_path

trainer = GRPOTrainer(
model = model_in,
reward_funcs=reward_funcs,
args=training_args,
train_dataset=dataset[script_args.dataset_train_split],
eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None,
peft_config=get_peft_config(model_args),
callbacks=get_callbacks(training_args, model_args),
processing_class=tokenizer,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant