We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I want to use GRPO to train a lora model after SFT, but the reward does not look normal. Is the model loaded correctly in this way?
grpo.py #############################
############################# if script_args.load_in_lora: base_model = AutoModelForCausalLM.from_pretrained( model_args.model_name_or_path, torch_dtype = torch_dtype, trust_remote_code=model_args.trust_remote_code, attn_implementation=model_args.attn_implementation, use_cache=False if training_args.gradient_checkpointing else True ) model = PeftModel.from_pretrained(base_model, model_id=script_args.lora_dir, is_trainable=True) model_in = model else: model_kwargs = dict( revision=model_args.model_revision, trust_remote_code=model_args.trust_remote_code, attn_implementation=model_args.attn_implementation, torch_dtype=torch_dtype, use_cache=False if training_args.gradient_checkpointing else True, ) training_args.model_init_kwargs = model_kwargs model_in = model_args.model_name_or_path
trainer = GRPOTrainer( model = model_in, reward_funcs=reward_funcs, args=training_args, train_dataset=dataset[script_args.dataset_train_split], eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None, peft_config=get_peft_config(model_args), callbacks=get_callbacks(training_args, model_args), processing_class=tokenizer, )
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I want to use GRPO to train a lora model after SFT, but the reward does not look normal. Is the model loaded correctly in this way?
grpo.py
#############################
Initialize the GRPO trainer
#############################
if script_args.load_in_lora:
base_model = AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
torch_dtype = torch_dtype,
trust_remote_code=model_args.trust_remote_code,
attn_implementation=model_args.attn_implementation,
use_cache=False if training_args.gradient_checkpointing else True
)
model = PeftModel.from_pretrained(base_model, model_id=script_args.lora_dir, is_trainable=True)
model_in = model
else:
model_kwargs = dict(
revision=model_args.model_revision,
trust_remote_code=model_args.trust_remote_code,
attn_implementation=model_args.attn_implementation,
torch_dtype=torch_dtype,
use_cache=False if training_args.gradient_checkpointing else True,
)
training_args.model_init_kwargs = model_kwargs
model_in = model_args.model_name_or_path
trainer = GRPOTrainer(
model = model_in,
reward_funcs=reward_funcs,
args=training_args,
train_dataset=dataset[script_args.dataset_train_split],
eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None,
peft_config=get_peft_config(model_args),
callbacks=get_callbacks(training_args, model_args),
processing_class=tokenizer,
)
The text was updated successfully, but these errors were encountered: