Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

Commit 14f3605

Browse files
epwalshdirkgr
authored andcommitted
fix cuda_device docs (#5188)
1 parent 29932ab commit 14f3605

File tree

2 files changed

+15
-5
lines changed

2 files changed

+15
-5
lines changed

CHANGELOG.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2222
See [PR #5172](https://github.com/allenai/allennlp/pull/5172) for more details.
2323
- Added `SpanExtractorWithSpanWidthEmbedding`, putting specific span embedding computations into the `_embed_spans` method and leaving the common code in `SpanExtractorWithSpanWidthEmbedding` to unify the arguments, and modified `BidirectionalEndpointSpanExtractor`, `EndpointSpanExtractor` and `SelfAttentiveSpanExtractor` accordingly. Now, `SelfAttentiveSpanExtractor` can also embed span widths.
2424

25-
2625
### Fixed
2726

2827
- When `PretrainedTransformerIndexer` folds long sequences, it no longer loses the information from token type ids.
28+
- Fixed documentation for `GradientDescentTrainer.cuda_device`.
2929

3030

3131
## [v2.4.0](https://github.com/allenai/allennlp/releases/tag/v2.4.0) - 2021-04-22

allennlp/training/trainer.py

+14-4
Original file line numberDiff line numberDiff line change
@@ -182,10 +182,20 @@ class GradientDescentTrainer(Trainer):
182182
A `Checkpointer` is responsible for periodically saving model weights. If none is given
183183
here, we will construct one with default parameters.
184184
185-
cuda_device : `int`, optional (default = `-1`)
186-
An integer specifying the CUDA device(s) to use for this process. If -1, the CPU is used.
187-
Data parallelism is controlled at the allennlp train level, so each trainer will have a single
188-
GPU.
185+
cuda_device : `Optional[Union[int, torch.device]]`, optional (default = `None`)
186+
An integer or `torch.device` specifying the CUDA device to use for this process.
187+
If -1, the CPU is used. If `None` and you have a GPU available, that GPU will be used.
188+
189+
!!! Note
190+
If you *don't* intend to use a GPU, but you have one available, you'll need
191+
to explicitly set `cuda_device=-1`.
192+
193+
!!! Note
194+
If you intend to use a GPU, your model already needs to be on the correct device,
195+
which you can do with `model = model.cuda()`.
196+
197+
!!! Note
198+
Data parallelism is controlled at the allennlp train level, so each trainer will have a single GPU.
189199
190200
grad_norm : `float`, optional, (default = `None`).
191201
If provided, gradient norms will be rescaled to have a maximum of this value.

0 commit comments

Comments
 (0)