Skip to content

Add configurable normalization schemes to SigLIP image processors #38444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rahulrshetty45
Copy link
Contributor

Summary

Addresses issue #38318 by adding configurable normalization schemes to SigLIP image processors, allowing users to choose between official SigLIP normalization and traditional ImageNet normalization while maintaining full backwards compatibility.

Problem

Users reported that SigLIP models may perform better for feature clustering when using traditional ImageNet normalization values instead of the official SigLIP values:

  • Official SigLIP: mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]
  • Traditional ImageNet: mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]

However, changing the default values would break backwards compatibility and contradict official SigLIP documentation.

Solution

Added a normalization_scheme parameter that provides user choice without breaking existing functionality:

Key Features:

  • Backwards Compatible: Default behavior unchanged - uses official SigLIP values
  • Configurable: Choose between "siglip" and "imagenet" schemes
  • Auto-Detection: Automatically detects scheme from existing configurations
  • Manual Override: Custom values still supported via image_mean/image_std
  • Consistent: Works across both fast and slow processors

Usage Examples

Default Usage (No Changes Required)

# Uses official SigLIP normalization [0.5, 0.5, 0.5] - unchanged behavior
processor = SiglipImageProcessor()

Better Clustering Performance

# Use ImageNet normalization for potentially better clustering
processor = SiglipImageProcessor(normalization_scheme="imagenet")

Auto-Detection from Configs

# Automatically detects ImageNet scheme from existing config values
config = {"image_mean": [0.485, 0.456, 0.406], "image_std": [0.229, 0.224, 0.225]}
processor = SiglipImageProcessor.from_dict(config)
print(processor.normalization_scheme)  # "imagenet"

Files Changed

  • src/transformers/models/siglip/image_processing_siglip.py
  • src/transformers/models/siglip/image_processing_siglip_fast.py

Testing

The implementation has been validated to ensure:

  • Default behavior remains unchanged
  • ImageNet scheme provides correct values
  • Auto-detection works properly
  • Manual overrides take precedence
  • Both fast and slow processors behave consistently

Related

Fixes #38318

This solution provides the best of both worlds: researchers can easily access ImageNet normalization for better clustering while maintaining official SigLIP compatibility by default.

@Rocketknight1
Copy link
Member

Hi @rahulrshetty45, please stop getting a code agent to write these PRs! They're extremely verbose and they add lots of new files that we don't want, and they change the codebase by adding new kwargs.

@rahulrshetty45
Copy link
Contributor Author

Hey! Yeah I totally get that, I actually submitted this PR before I saw your earlier comment about the coding agent stuff.

This is my first time contributing to the repo, so I kinda overdid it trying to cover all bases and make it “perfect.”
Appreciate the feedback though, I’ll keep things much simpler and cleaner going forward, sorry for the trouble!

@rahulrshetty45 rahulrshetty45 force-pushed the fix-siglip-normalization-38318 branch from 079ac7c to b732186 Compare May 28, 2025 17:11
@rahulrshetty45
Copy link
Contributor Author

hey @Rocketknight1
I’ve just pushed the simplified version and stripped out all the extra complexity. It’s now just that 8-line change across the two SigLIP files.

Please let me know if you spot anything else or have any concerns. Thanks again!

@Rocketknight1
Copy link
Member

Hi @rahulrshetty45, following the discussion in #38318 I don't know if we should merge this - it seems like the change isn't required, and it's mainly a problem with confusing constant naming (that we probably can't change for backward compatibility reasons)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential mix-up with IMAGENET_STANDARD and IMAGENET_DEFAULT values
2 participants