Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

fix cuda_device param docs in trainer #5188

Merged
merged 1 commit into from
May 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
See [PR #5172](https://github.com/allenai/allennlp/pull/5172) for more details.
- Added `SpanExtractorWithSpanWidthEmbedding`, putting specific span embedding computations into the `_embed_spans` method and leaving the common code in `SpanExtractorWithSpanWidthEmbedding` to unify the arguments, and modified `BidirectionalEndpointSpanExtractor`, `EndpointSpanExtractor` and `SelfAttentiveSpanExtractor` accordingly. Now, `SelfAttentiveSpanExtractor` can also embed span widths.


### Fixed

- When `PretrainedTransformerIndexer` folds long sequences, it no longer loses the information from token type ids.
- Fixed documentation for `GradientDescentTrainer.cuda_device`.


## [v2.4.0](https://github.com/allenai/allennlp/releases/tag/v2.4.0) - 2021-04-22
Expand Down
18 changes: 14 additions & 4 deletions allennlp/training/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,10 +182,20 @@ class GradientDescentTrainer(Trainer):
A `Checkpointer` is responsible for periodically saving model weights. If none is given
here, we will construct one with default parameters.

cuda_device : `int`, optional (default = `-1`)
An integer specifying the CUDA device(s) to use for this process. If -1, the CPU is used.
Data parallelism is controlled at the allennlp train level, so each trainer will have a single
GPU.
cuda_device : `Optional[Union[int, torch.device]]`, optional (default = `None`)
An integer or `torch.device` specifying the CUDA device to use for this process.
If -1, the CPU is used. If `None` and you have a GPU available, that GPU will be used.

!!! Note
If you *don't* intend to use a GPU, but you have one available, you'll need
to explicitly set `cuda_device=-1`.

!!! Note
If you intend to use a GPU, your model already needs to be on the correct device,
which you can do with `model = model.cuda()`.

!!! Note
Data parallelism is controlled at the allennlp train level, so each trainer will have a single GPU.

grad_norm : `float`, optional, (default = `None`).
If provided, gradient norms will be rescaled to have a maximum of this value.
Expand Down