-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Remove scattering for multi-GPU training. #2200
Changes from 80 commits
5d179a4
2db75b4
f7deed3
c9de1ec
634b4a2
e4a7b51
9eb6e46
ac425a4
f203cde
bde39fe
4b3a81c
595b668
731e69c
4522f1c
d091cc8
971e600
24b763b
71e2cce
01a111a
975060a
100f07f
5dcd700
87e6241
4b3ce38
1338afb
fa86367
7cd29aa
f6e57d1
6ec4a6e
d7c0208
c54534d
16ca024
a8e8eb6
53a283c
75f03fd
e8ad0c6
8cfe033
13f6d83
726cf13
8dacca9
c6694b9
9e617b2
468300f
30894d0
e79119f
724cf89
219d026
8d81d3f
ae8be54
2cf180e
f68e647
e0d71c4
2d06b4b
7662c0f
a48e494
560f99f
4da2ff3
50c1e15
f93b6dc
4f667ab
0cbb57a
64bbfbd
255045a
862b87b
cc51bbc
eb8419c
4856e77
e329e72
9104044
4a133b1
86c76fb
8844663
f4726a6
2323bbf
98629f1
a68db07
6eda737
63644f1
79ed01a
d3e4921
34b2adf
c7a5a96
7c19f04
95f5804
ee5df46
2e6f990
12e62d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,7 +21,7 @@ local BASE_READER = { | |
"type": "elmo_characters" | ||
} | ||
}, | ||
"max_sequence_length": 500, | ||
"max_sequence_length": 400, | ||
"start_tokens": ["<S>"], | ||
"end_tokens": ["</S>"] | ||
}; | ||
|
@@ -34,7 +34,7 @@ local BASE_ITERATOR = { | |
// samples in every batch. | ||
"batch_size": 512 * NUM_GPUS, | ||
"sorting_keys": [["source", "num_tokens"]], | ||
"maximum_samples_per_batch": ["num_tokens", NUM_GPUS * 1000] | ||
"maximum_samples_per_batch": ["num_tokens", 2000] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a minor backwards compatibility issue here. We're effectively multiplying the batch size (for multi-GPU users) by the number of GPUs. In practice this will result in some OOMs for users that were running close to their memory limits. Given that we had an experimental warning for that use case I think this okay, but I'm curious if you have other thoughts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems fine to me, too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. |
||
}; | ||
|
||
{ | ||
|
@@ -117,7 +117,7 @@ local BASE_ITERATOR = { | |
// The multiprocess dataset reader and iterator use many file descriptors, | ||
// so we need to increase the ulimit depending on the size of this queue. | ||
// See https://pytorch.org/docs/stable/multiprocessing.html#file-descriptor-file-descriptor | ||
// for a description of the underlying issue. `ulimit -n 4096` has sufficed, | ||
// for a description of the underlying issue. `ulimit -n 8192` has sufficed, | ||
// but that number could use tuning. | ||
"output_queue_size": 500 | ||
}, | ||
|
@@ -139,6 +139,7 @@ local BASE_ITERATOR = { | |
// See https://github.com/allenai/calypso/blob/master/bin/train_transformer_lm1b.py#L51. | ||
// Adjusted based on our sample size relative to Calypso's. | ||
"warmup_steps": 6000 | ||
} | ||
}, | ||
"should_log_learning_rate": true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputs
is supposed to be a list of empty tuples? This never gets updated before getting passed toparallel_apply
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to clarify. You can see that
()
was passed to the oldscatter_kwargs
as well.