A quick start guide and additional materials are available on our project page. To learn more, refer to our arXiv preprint.
To set up the environment, navigate to the root directory containing environment.yml
and run:
conda env create --name interpretation_env --file environment.yml
conda activate interpretation_env
Given a feature extractor E and an image i, we can obtain its features as f = E(i). The reconstruction model is trained on pairs (i, f). To generate such dataset pairs:
Generate Validation Split
# Prepare Data
mkdir coco_subsets
cd coco_subsets
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/zips/train2017.zip
unzip train2017.zip # Extract archieve
unzip val2017.zip # Extract archieve
cd - # Get back to project root
Generate Validation Split Features
Generated dataset size ~ 300Mb
# Run Validation Dataset Generation
VISION_MODEL="google/siglip2-base-patch16-512"
python dataset_generation/generation.py \
--vision_model_name "$VISION_MODEL" \
--coco_images_path "./coco_subsets/val2017" \
--split val \
--max_count 1000
Generate Train Split Features
Generated dataset size ~ 30GB
You may limit count of processed images with --max_count 1000
parameter.
VISION_MODEL="google/siglip2-base-patch16-512"
python dataset_generation/generation.py \
--vision_model_name "$VISION_MODEL" \
--coco_images_path "./coco_subsets/train2017" \
--split train
This script will:
- Create
feature_extractor_weights
directory for storing pretrained weights - Generate datasets in
generated_datasets
directory - Use images from
coco_subsets/val2017
by default (configurable via script flags)
Run reconstructor training
Script running takes could take from 6 to 24 hours depending from model supported image resolution.
python training/train.py --vision_model_name $VISION_MODEL
This will:
- Train a reconstructor for
google/siglip2-base-patch16-512
by default - Use the generated dataset from previous step
- Create
training/samples
for training logs - Save weights in
training/checkpoint
google/siglip-base-patch16-{224,256,384,512}
google/siglip2-base-patch16-{224,256,384,512}
Modify the script arguments to use different extractors.
To compute CLIP similarity metrics:
- Generate dataset for your target feature extractor
- Train reconstructor or use precomputed weights
- Place weights in
metrics_calculation/precalculated_weights/
following the pattern:models--google--siglip-base-patch16-512.pt
models--google--siglip2-base-patch16-512.pt
- Run:
bash metrics_calculation/siglip_vs_siglip2/calculate_similarity.sh
For SigLIP vs SigLIP2 comparison:
- Compute metrics for all 8 models
- Run the analysis notebook:
metrics_calculation/siglip_vs_siglip2/understanding_graphs_for_article.ipynb
Example output:
To study orthogonal transformations in feature space:
- Generate dataset for
google/siglip2-base-patch16-512
- Train reconstructor or use precomputed weights
- Place weights at:
metrics_calculation/precalculated_weights/models--google--siglip2-base-patch16-512.pt
- Run the analysis notebook:
metrics_calculation/rb_swap/understanding_rgb-to-bgr_rotation.ipynb
Example output:
To study linear transformations in feature space:
- Generate dataset for
google/siglip2-base-patch16-512
- Train reconstructor or use precomputed weights
- Place weights at:
metrics_calculation/precalculated_weights/models--google--siglip2-base-patch16-512.pt
- Run the analysis notebook:
metrics_calculation/b_channel_suppression/understanding_b_suppression.ipynb
Example output:
To study linear transformations in feature space:
- Generate dataset for
google/siglip2-base-patch16-512
- Train reconstructor or use precomputed weights
- Place weights at:
metrics_calculation/precalculated_weights/models--google--siglip2-base-patch16-512.pt
- Run the analysis notebook:
metrics_calculation/colorization/understanding_colorization.ipynb
Example output:
If you find this work useful, please cite it as follows:
@misc{allakhverdov2025imagereconstructiontoolfeature,
title={Image Reconstruction as a Tool for Feature Analysis},
author={Eduard Allakhverdov and Dmitrii Tarasov and Elizaveta Goncharova and Andrey Kuznetsov},
year={2025},
eprint={2506.07803},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.07803},
}