Skip to content

Fix default momentum value of BatchNorm2d in MaxViT from 0.99 to 0.01 #8312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions torchvision/models/maxvit.py
Original file line number Diff line number Diff line change
Expand Up @@ -577,7 +577,7 @@ class MaxVit(nn.Module):
stochastic_depth_prob (float): Probability of stochastic depth. Expands to a list of probabilities for each layer that scales linearly to the specified value.
squeeze_ratio (float): Squeeze ratio in the SE Layer. Default: 0.25.
expansion_ratio (float): Expansion ratio in the MBConv bottleneck. Default: 4.
norm_layer (Callable[..., nn.Module]): Normalization function. Default: None (setting to None will produce a `BatchNorm2d(eps=1e-3, momentum=0.99)`).
norm_layer (Callable[..., nn.Module]): Normalization function. Default: None (setting to None will produce a `BatchNorm2d(eps=1e-3, momentum=0.01)`).
activation_layer (Callable[..., nn.Module]): Activation function Default: nn.GELU.
head_dim (int): Dimension of the attention heads.
mlp_ratio (int): Expansion ratio of the MLP layer. Default: 4.
Expand Down Expand Up @@ -623,7 +623,7 @@ def __init__(
# https://github.com/google-research/maxvit/blob/da76cf0d8a6ec668cc31b399c4126186da7da944/maxvit/models/maxvit.py#L1029-L1030
# for the exact parameters used in batchnorm
if norm_layer is None:
norm_layer = partial(nn.BatchNorm2d, eps=1e-3, momentum=0.99)
norm_layer = partial(nn.BatchNorm2d, eps=1e-3, momentum=0.01)

# Make sure input size will be divisible by the partition size in all blocks
# Undefined behavior if H or W are not divisible by p
Expand Down Expand Up @@ -788,7 +788,8 @@ class MaxVit_T_Weights(WeightsEnum):
},
"_ops": 5.558,
"_file_size": 118.769,
"_docs": """These weights reproduce closely the results of the paper using a similar training recipe.""",
"_docs": """These weights reproduce closely the results of the paper using a similar training recipe.
They were trained with a BatchNorm2D momentum of 0.99 instead of the more correct 0.01.""",
},
)
DEFAULT = IMAGENET1K_V1
Expand Down