Skip to content

How to generate gt saliency map according to the fixdata? #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ProNoobLi opened this issue Nov 5, 2020 · 5 comments
Open

How to generate gt saliency map according to the fixdata? #8

ProNoobLi opened this issue Nov 5, 2020 · 5 comments

Comments

@ProNoobLi
Copy link

Hi, Lai, nice work.
Based on my understanding, the gt should be a 2D map instead of such groups of fixation pixels in Data.fixdata. However, how do you generate such fixation data into the ground truth?
Thank you

@remega
Copy link
Owner

remega commented Nov 7, 2020

Hi Li,

Please refer to this function: https://github.com/remega/LEDOV-eye-tracking-database/blob/master/make_gauss_masks4.m

@ProNoobLi
Copy link
Author

Thanks!
It seems that the saliency map is a kind of soft mask ranging from 0-1 which indicates the probability of saliency, while the sigma in gaussian represents that mask weights.
Thus, what sigma in the gaussian mask do you suggest?

@remega
Copy link
Owner

remega commented Nov 9, 2020

Technically, it depends on the setting of the eyes-tracking experiment, such as the distance between the subject and video-display screen and the screen resolution. For our LEDOV, you can use the default setting in the function.

@ProNoobLi
Copy link
Author

Hi Lai, I have one more question.
Should I divide the fix_map by max value to normalize the mask like you did in the makegauss4.m? Or accumulate the gaussian value of each fixation from each subject?
In my opinion, the normalized one is only suitable for visualization like those heatmap in your datasets. However, the ground truth map should not only represent the mask but also the importance which is weighted by the intensity. Additionally, the formula to generate the fixation map from another paper (Improving Video Compression With Deep Visual-Attention
Models) is like:
image
As we see from the paper, the SM is summed up without normalization.

@remega
Copy link
Owner

remega commented Nov 16, 2020

Hi Li,

The value of a saliency map indicates the possibility of per pixel that is salient or not, so it should be normalized to [0,1].
In the eye-tracking experiment, #subjects and #fixations for each image are different. If not normalized, the maximum and the sparsity of the saliency mask would be inconsistent across images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants