1
+ <!DOCTYPE html>
2
+ < html >
3
+
4
+ < head >
5
+ < meta charset ="utf-8 ">
6
+ <!-- Meta tags for social media banners, these should be filled in appropriatly as they are your "business card" -->
7
+ <!-- Replace the content tag with appropriate information -->
8
+ < meta name ="description " content ="Image Reconstruction as a Tool for Feature Analysis ">
9
+ < meta property ="og:title " content ="Image Reconstruction as a Tool for Feature Analysis " />
10
+ < meta property ="og:description "
11
+ content ="A novel approach for interpreting vision features via image reconstruction " />
12
+ < meta property ="og:url " content ="https://fusionbrainlab.github.io/feature_analysis " />
13
+ <!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X630-->
14
+ < meta property ="og:image " content ="static/images/v1_vs_v2.png " />
15
+ < meta property ="og:image:width " content ="1200 " />
16
+ < meta property ="og:image:height " content ="630 " />
17
+
18
+
19
+ < meta name ="twitter:title " content ="Image Reconstruction as a Tool for Feature Analysis ">
20
+ < meta name ="twitter:description " content ="A novel approach for interpreting vision features via image reconstruction ">
21
+ <!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X600-->
22
+ < meta name ="twitter:image " content ="static/images/v1_vs_v2.png ">
23
+ < meta name ="twitter:card " content ="summary_large_image ">
24
+ <!-- Keywords for your paper to be indexed by-->
25
+ < meta name ="keywords " content ="computer vision, feature analysis, image reconstruction, vision encoders ">
26
+ < meta name ="viewport " content ="width=device-width, initial-scale=1 ">
27
+
28
+
29
+ < title > Image Reconstruction as a Tool for Feature Analysis</ title >
30
+ < link rel ="icon " type ="image/x-icon " href ="static/images/favicon.ico ">
31
+ < link href ="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro " rel ="stylesheet ">
32
+
33
+ < link rel ="stylesheet " href ="static/css/bulma.min.css ">
34
+ < link rel ="stylesheet " href ="static/css/bulma-carousel.min.css ">
35
+ < link rel ="stylesheet " href ="static/css/bulma-slider.min.css ">
36
+ < link rel ="stylesheet " href ="static/css/fontawesome.all.min.css ">
37
+ < link rel ="stylesheet " href ="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css ">
38
+ < link rel ="stylesheet " href ="static/css/index.css ">
39
+
40
+ < script src ="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js "> </ script >
41
+ < script src ="https://documentcloud.adobe.com/view-sdk/main.js "> </ script >
42
+ < script defer src ="static/js/fontawesome.all.min.js "> </ script >
43
+ < script src ="static/js/bulma-carousel.min.js "> </ script >
44
+ < script src ="static/js/bulma-slider.min.js "> </ script >
45
+ < script src ="static/js/index.js "> </ script >
46
+ </ head >
47
+
48
+ < body >
49
+
50
+
51
+ < section class ="hero ">
52
+ < div class ="hero-body ">
53
+ < div class ="container is-max-desktop ">
54
+ < div class ="columns is-centered ">
55
+ < div class ="column has-text-centered ">
56
+ < h1 class ="title is-1 publication-title "> Image Reconstruction as a Tool for Feature Analysis</ h1 >
57
+ < div class ="is-size-5 publication-authors ">
58
+ <!-- Paper authors -->
59
+ < span class ="author-block ">
60
+ < a href ="
mailto:[email protected] "
target ="
_blank "
> Eduard Allakhverdov
</ a >
61
+ </ span >
62
+ < span class ="author-block ">
63
+ < a href ="
mailto:[email protected] "
target ="
_blank "
> Dmitrii Tarasov
</ a >
64
+ </ span >
65
+ < span class ="author-block ">
66
+ < a href ="
mailto:[email protected] "
target ="
_blank "
> Elizaveta Goncharova
</ a >
67
+ </ span >
68
+ < span class ="author-block ">
69
+ < a href ="
mailto:[email protected] "
target ="
_blank "
> Andrey Kuznetsov
</ a >
70
+ </ span >
71
+ </ div >
72
+ < div class ="is-size-5 publication-authors ">
73
+ < span class ="author-block ">
74
+ AIRI, Moscow, Russia< br >
75
+ MIPT, Dolgoprudny, Russia
76
+ </ span >
77
+ < span class ="author-block ">
78
+ AIRI, Moscow, Russia
79
+ </ span >
80
+ < span class ="author-block ">
81
+ AIRI, Moscow, Russia
82
+ </ span >
83
+ < span class ="author-block ">
84
+ AIRI, Moscow, Russia
85
+ </ span >
86
+ </ div >
87
+
88
+ < div class ="column has-text-centered ">
89
+ < div class ="publication-links ">
90
+
91
+ <!-- Github link -->
92
+ < span class ="link-block ">
93
+ < a href ="https://github.com/FusionBrainLab/feature_analysis " target ="_blank "
94
+ class ="external-link button is-normal is-rounded is-dark ">
95
+ < span class ="icon ">
96
+ < i class ="fab fa-github "> </ i >
97
+ </ span >
98
+ < span > Code</ span >
99
+ </ a >
100
+ </ span >
101
+
102
+ <!-- ArXiv abstract Link -->
103
+ < span class ="link-block ">
104
+ < a href ="https://arxiv.org/abs/<ARXIV PAPER ID> " target ="_blank "
105
+ class ="external-link button is-normal is-rounded is-dark ">
106
+ < span class ="icon ">
107
+ < i class ="ai ai-arxiv "> </ i >
108
+ </ span >
109
+ < span > arXiv</ span >
110
+ </ a >
111
+ </ span >
112
+ </ div >
113
+ </ div >
114
+ </ div >
115
+ </ div >
116
+ </ section >
117
+
118
+
119
+
120
+ <!-- Paper abstract -->
121
+ < section class ="section hero is-light ">
122
+ < div class ="container is-max-desktop ">
123
+ < div class ="columns is-centered has-text-centered ">
124
+ < div class ="column is-four-fifths ">
125
+ < h2 class ="title is-3 "> Abstract</ h2 >
126
+ < div class ="content has-text-justified ">
127
+ < p >
128
+ Vision encoders are increasingly used in modern applications, from vision-only models to multimodal
129
+ systems such as vision-language models. Despite their remarkable success, it remains unclear how these
130
+ architectures represent features internally. Here, we propose a novel approach for interpreting vision
131
+ features via image reconstruction. We compare two related model families, SigLIP and SigLIP2, which differ
132
+ only in their training objective, and show that encoders pre-trained on image-based tasks retain
133
+ significantly more image information than those trained on non-image tasks such as contrastive learning.
134
+ We further apply our method to a range of vision encoders, ranking them by the informativeness of their
135
+ feature representations. Finally, we demonstrate that manipulating the feature space yields predictable
136
+ changes in reconstructed images, revealing that orthogonal rotations — rather than spatial transformations
137
+ — control color encoding. Our approach can be applied to any vision encoder, shedding light on the inner
138
+ structure of its feature space.
139
+ </ div >
140
+ </ div >
141
+ </ div >
142
+ </ div >
143
+ </ section >
144
+ <!-- End paper abstract -->
145
+
146
+ <!-- Прописать явно контрибушны: -->
147
+
148
+ <!-- (1) interpretability metric -->
149
+ <!-- Текстовое объяснение -->
150
+ <!-- Базовые результаты: siglip vs siglip2 -->
151
+ <!-- Нужно четко подчеркнуть какие различия между модельками -->
152
+ <!-- И как это влияет на реконструкцию -->
153
+ <!-- -->
154
+
155
+ <!-- (2) Feature-space transformations -->
156
+ <!-- Текстовое объяснение -->
157
+ <!-- Визуализация фреймворка: обобщил оператор в пр-ве картинок и в пр-ве фичей -->
158
+ <!-- Сделать видос с визуализацией фреймворка -->
159
+ <!-- Примеры работы с RGB -->
160
+ <!-- Примеры работы с отключением одного канала (ожелтением) -->
161
+ <!-- -->
162
+
163
+
164
+ <!-- Image carousel -->
165
+ < section class ="hero is-small ">
166
+ < div class ="hero-body ">
167
+ < div class ="container ">
168
+ < div id ="results-carousel " class ="carousel results-carousel ">
169
+ < div class ="item ">
170
+ <!-- Your image here -->
171
+ < img src ="static/images/v1_vs_v2.png " alt ="Comparison of SigLIP and SigLIP2 reconstructions " />
172
+ < h2 class ="subtitle has-text-centered ">
173
+ Comparison of image reconstructions from SigLIP and SigLIP2 feature spaces.
174
+ </ h2 >
175
+ </ div >
176
+ < div class ="item ">
177
+ <!-- Your image here -->
178
+ < img src ="static/images/rb_swap.png " alt ="Red-Blue channel swap visualization " />
179
+ < h2 class ="subtitle has-text-centered ">
180
+ Visualization of feature space manipulation through red-blue channel swap.
181
+ </ h2 >
182
+ </ div >
183
+ </ div >
184
+ </ div >
185
+ </ div >
186
+ </ section >
187
+ <!-- End image carousel -->
188
+
189
+
190
+
191
+
192
+ <!--BibTex citation -->
193
+ < section class ="section " id ="BibTeX ">
194
+ < div class ="container is-max-desktop content ">
195
+ < h2 class ="title "> BibTeX</ h2 >
196
+ < pre > < code > @article{feature_analysis,
197
+ title={Image Reconstruction as a Tool for Feature Analysis},
198
+ author={Allakhverdov, Eduard and Tarasov, Dmitrii and Goncharova, Elizaveta and Kuznetsov, Andrey},
199
+ journal={arXiv preprint},
200
+ year={2024}
201
+ }</ code > </ pre >
202
+ </ div >
203
+ </ section >
204
+ <!--End BibTex citation -->
205
+
206
+
207
+ < footer class ="footer ">
208
+ < div class ="container ">
209
+ < div class ="columns is-centered ">
210
+ < div class ="column is-8 ">
211
+ < div class ="content ">
212
+
213
+ < p >
214
+ This page was built using the < a href ="https://github.com/eliahuhorwitz/Academic-project-page-template "
215
+ target ="_blank "> Academic Project Page Template</ a > which was adopted from the < a
216
+ href ="https://nerfies.github.io " target ="_blank "> Nerfies</ a > project page.
217
+ You are free to borrow the source code of this website, we just ask that you link back to this page in the
218
+ footer. < br > This website is licensed under a < a rel ="license "
219
+ href ="http://creativecommons.org/licenses/by-sa/4.0/ " target ="_blank "> Creative
220
+ Commons Attribution-ShareAlike 4.0 International License</ a > .
221
+ </ p >
222
+
223
+ </ div >
224
+ </ div >
225
+ </ div >
226
+ </ div >
227
+ </ footer >
228
+
229
+ <!-- Statcounter tracking code -->
230
+
231
+ <!-- You can add a tracker to track page visits by creating an account at statcounter.com -->
232
+
233
+ <!-- End of Statcounter Code -->
234
+
235
+ </ body >
236
+
237
+ </ html >
0 commit comments