Skip to content

Commit 782db70

Browse files
authored
Merge pull request #19 from b-chu/patch-1
Enable FSDP sharding for bias
2 parents 71278fa + e198673 commit 782db70

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

megablocks/layers/moe.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ def __init__(self, args : Arguments):
119119
# Note that the output bias is not parallelized with expert
120120
# model parallelism.
121121
self.bias = torch.nn.Parameter(torch.empty(
122-
1, 1, args.hidden_size,
122+
args.hidden_size,
123123
device=args.device,
124124
dtype=common.dtype(args)))
125125
torch.nn.init.zeros_(self.bias)

0 commit comments

Comments
 (0)