missing projection weights for finetuning probelm #3

setayeshk · 2025-01-28T06:25:42Z

Hello,

i was trying to finetune this model on flicrfa dataset. but the projection heads start from random weights, which drastically reduces the model's performance (i used clip trainer for my training process)
aslo i have problem loading the model after pushing it

I was thinking of fine-tuning the projection heads first and then, after a few epochs, fine-tuning the LoRA configuration of the last two layers of the text and image encoders.

also i didn’t quite understand the purpose of the clip_wrapper in your training notebook. Doesn’t it cause issues by setting hidden_size to 1? Wouldn’t that mean the loss during training is computed on projection heads of size 1, which seems meaningless?

Could you please clarify this for me and provide guidance on the best way to fine-tune your model on FlickrFA while ensuring the projection heads start with meaningful weights and maintaining good performance?

this is my code for loading and finetuning the model:

text_encoder = AutoModel.from_pretrained(text_model_name)
tokenizer = AutoTokenizer.from_pretrained(text_model_name)

vision_encoder = CLIPVisionModel.from_pretrained(vision_model_name)
vision_processor = CLIPFeatureExtractor.from_pretrained(feature_extractor_name)

config = CLIPConfig.from_text_vision_configs(
    text_config=text_encoder.config,
    vision_config=vision_encoder.config
)


clip_model = CLIPModel(config)
clip_model.text_model = text_encoder
clip_model.vision_model = vision_encoder

trainer = CLIPTrainer(
model=clip_model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
data_collator=data_collator,
tokenizer=tokenizer,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)]
)

trainer.save_model(args.output_dir)

and this is the code for loading it:

vision_encoder = CLIPVisionModel.from_pretrained(args.model_repo, revision=args.revision)
text_encoder = RobertaModel.from_pretrained(args.model_repo, revision=args.revision)

Some weights of CLIPVisionModel were not initialized from the model checkpoint at Setayeshk/Clipfa_finetune and are newly initialized: ['vision_model.embeddings.class_embedding', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder ...

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

missing projection weights for finetuning probelm #3

missing projection weights for finetuning probelm #3

setayeshk commented Jan 28, 2025 •

edited

Loading

missing projection weights for finetuning probelm #3

missing projection weights for finetuning probelm #3

Comments

setayeshk commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

setayeshk commented Jan 28, 2025 •

edited

Loading