allenai · AkshitaB · Jun 2, 2021 · Apr 29, 2021 · May 5, 2021 · May 5, 2021
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -37,6 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Added a `min_steps` parameter to `BeamSearch` to set a minimum length for the predicted sequences.
 - Added the `FinalSequenceScorer` abstraction to calculate the final scores of the generated sequences in `BeamSearch`. 
 - Added `shuffle` argument to `BucketBatchSampler` which allows for disabling shuffling.
+- Added `allennlp.modules.transformer.attention_module` which contains a generalized `AttentionModule`. `SelfAttention` and `T5Attention` both inherit from this.
 - Added a `Constraint` abstract class to `BeamSearch`, which allows for incorporating constraints on the predictions found by `BeamSearch`,
   along with a `RepeatedNGramBlockingConstraint` constraint implementation, which allows for preventing repeated n-grams in the output from `BeamSearch`.
 - Added `DataCollator` for dynamic operations for each batch.

diff --git a/allennlp/modules/transformer/__init__.py b/allennlp/modules/transformer/__init__.py
@@ -131,7 +131,7 @@ def forward(self, token_ids: torch.LongTensor, mask: torch.BoolTensor):
     TransformerEmbeddings,
     ImageFeatureEmbeddings,
 )
-from allennlp.modules.transformer.self_attention import SelfAttention
+from allennlp.modules.transformer.attention_module import SelfAttention, T5Attention
 from allennlp.modules.transformer.activation_layer import ActivationLayer
 from allennlp.modules.transformer.transformer_layer import AttentionLayer, TransformerLayer
 from allennlp.modules.transformer.transformer_stack import TransformerStack