Skip to content

Commit 4c8ec66

Browse files
authored
Fix LayoutLMv3 documentation (#17932)
* fix typos * fix sequence_length docs of LayoutLMv3Model * delete trailing white spaces * fix layoutlmv3 docs more * apply make fixup & quality * change to two versions of input docstring * apply make fixup & quality
1 parent f762f37 commit 4c8ec66

File tree

1 file changed

+99
-13
lines changed

1 file changed

+99
-13
lines changed

src/transformers/models/layoutlmv3/modeling_layoutlmv3.py

Lines changed: 99 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -54,17 +54,93 @@
5454
behavior.
5555
5656
Parameters:
57-
config ([`LayoutLMv2Config`]): Model configuration class with all the parameters of the model.
57+
config ([`LayoutLMv3Config`]): Model configuration class with all the parameters of the model.
5858
Initializing with a config file does not load the weights associated with the model, only the
5959
configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
6060
"""
6161

62-
LAYOUTLMV3_INPUTS_DOCSTRING = r"""
62+
LAYOUTLMV3_MODEL_INPUTS_DOCSTRING = r"""
6363
Args:
64-
input_ids (`torch.LongTensor` of shape `{0}`):
64+
input_ids (`torch.LongTensor` of shape `({0})`):
6565
Indices of input sequence tokens in the vocabulary.
6666
67-
Indices can be obtained using [`LayoutLMv2Tokenizer`]. See [`PreTrainedTokenizer.encode`] and
67+
Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS]
68+
token. See `pixel_values` for `patch_sequence_length`.
69+
70+
Indices can be obtained using [`LayoutLMv3Tokenizer`]. See [`PreTrainedTokenizer.encode`] and
71+
[`PreTrainedTokenizer.__call__`] for details.
72+
73+
[What are input IDs?](../glossary#input-ids)
74+
75+
bbox (`torch.LongTensor` of shape `({0}, 4)`, *optional*):
76+
Bounding boxes of each input sequence tokens. Selected in the range `[0,
77+
config.max_2d_position_embeddings-1]`. Each bounding box should be a normalized version in (x0, y0, x1, y1)
78+
format, where (x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1,
79+
y1) represents the position of the lower right corner.
80+
81+
Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS]
82+
token. See `pixel_values` for `patch_sequence_length`.
83+
84+
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
85+
Batch of document images. Each image is divided into patches of shape `(num_channels, config.patch_size,
86+
config.patch_size)` and the total number of patches (=`patch_sequence_length`) equals to `((height /
87+
config.patch_size) * (width / config.patch_size))`.
88+
89+
attention_mask (`torch.FloatTensor` of shape `({0})`, *optional*):
90+
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
91+
92+
- 1 for tokens that are **not masked**,
93+
- 0 for tokens that are **masked**.
94+
95+
Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS]
96+
token. See `pixel_values` for `patch_sequence_length`.
97+
98+
[What are attention masks?](../glossary#attention-mask)
99+
token_type_ids (`torch.LongTensor` of shape `({0})`, *optional*):
100+
Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
101+
1]`:
102+
103+
- 0 corresponds to a *sentence A* token,
104+
- 1 corresponds to a *sentence B* token.
105+
106+
Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS]
107+
token. See `pixel_values` for `patch_sequence_length`.
108+
109+
[What are token type IDs?](../glossary#token-type-ids)
110+
position_ids (`torch.LongTensor` of shape `({0})`, *optional*):
111+
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
112+
config.max_position_embeddings - 1]`.
113+
114+
Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS]
115+
token. See `pixel_values` for `patch_sequence_length`.
116+
117+
[What are position IDs?](../glossary#position-ids)
118+
head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
119+
Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
120+
121+
- 1 indicates the head is **not masked**,
122+
- 0 indicates the head is **masked**.
123+
124+
inputs_embeds (`torch.FloatTensor` of shape `({0}, hidden_size)`, *optional*):
125+
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
126+
is useful if you want more control over how to convert *input_ids* indices into associated vectors than the
127+
model's internal embedding lookup matrix.
128+
output_attentions (`bool`, *optional*):
129+
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
130+
tensors for more detail.
131+
output_hidden_states (`bool`, *optional*):
132+
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
133+
more detail.
134+
return_dict (`bool`, *optional*):
135+
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
136+
"""
137+
138+
LAYOUTLMV3_DOWNSTREAM_INPUTS_DOCSTRING = r"""
139+
Args:
140+
input_ids (`torch.LongTensor` of shape `({0})`):
141+
Indices of input sequence tokens in the vocabulary.
142+
143+
Indices can be obtained using [`LayoutLMv3Tokenizer`]. See [`PreTrainedTokenizer.encode`] and
68144
[`PreTrainedTokenizer.__call__`] for details.
69145
70146
[What are input IDs?](../glossary#input-ids)
@@ -76,24 +152,26 @@
76152
y1) represents the position of the lower right corner.
77153
78154
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
79-
Batch of document images.
155+
Batch of document images. Each image is divided into patches of shape `(num_channels, config.patch_size,
156+
config.patch_size)` and the total number of patches (=`patch_sequence_length`) equals to `((height /
157+
config.patch_size) * (width / config.patch_size))`.
80158
81-
attention_mask (`torch.FloatTensor` of shape `{0}`, *optional*):
159+
attention_mask (`torch.FloatTensor` of shape `({0})`, *optional*):
82160
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
83161
84162
- 1 for tokens that are **not masked**,
85163
- 0 for tokens that are **masked**.
86164
87165
[What are attention masks?](../glossary#attention-mask)
88-
token_type_ids (`torch.LongTensor` of shape `{0}`, *optional*):
166+
token_type_ids (`torch.LongTensor` of shape `({0})`, *optional*):
89167
Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
90168
1]`:
91169
92170
- 0 corresponds to a *sentence A* token,
93171
- 1 corresponds to a *sentence B* token.
94172
95173
[What are token type IDs?](../glossary#token-type-ids)
96-
position_ids (`torch.LongTensor` of shape `{0}`, *optional*):
174+
position_ids (`torch.LongTensor` of shape `({0})`, *optional*):
97175
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
98176
config.max_position_embeddings - 1]`.
99177
@@ -104,7 +182,7 @@
104182
- 1 indicates the head is **not masked**,
105183
- 0 indicates the head is **masked**.
106184
107-
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
185+
inputs_embeds (`torch.FloatTensor` of shape `({0}, hidden_size)`, *optional*):
108186
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
109187
is useful if you want more control over how to convert *input_ids* indices into associated vectors than the
110188
model's internal embedding lookup matrix.
@@ -763,7 +841,9 @@ def forward_image(self, pixel_values):
763841

764842
return embeddings
765843

766-
@add_start_docstrings_to_model_forward(LAYOUTLMV3_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
844+
@add_start_docstrings_to_model_forward(
845+
LAYOUTLMV3_MODEL_INPUTS_DOCSTRING.format("batch_size, token_sequence_length")
846+
)
767847
@replace_return_docstrings(output_type=BaseModelOutput, config_class=_CONFIG_FOR_DOC)
768848
def forward(
769849
self,
@@ -975,7 +1055,9 @@ def __init__(self, config):
9751055

9761056
self.init_weights()
9771057

978-
@add_start_docstrings_to_model_forward(LAYOUTLMV3_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1058+
@add_start_docstrings_to_model_forward(
1059+
LAYOUTLMV3_DOWNSTREAM_INPUTS_DOCSTRING.format("batch_size, sequence_length")
1060+
)
9791061
@replace_return_docstrings(output_type=TokenClassifierOutput, config_class=_CONFIG_FOR_DOC)
9801062
def forward(
9811063
self,
@@ -1084,7 +1166,9 @@ def __init__(self, config):
10841166

10851167
self.init_weights()
10861168

1087-
@add_start_docstrings_to_model_forward(LAYOUTLMV3_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1169+
@add_start_docstrings_to_model_forward(
1170+
LAYOUTLMV3_DOWNSTREAM_INPUTS_DOCSTRING.format("batch_size, sequence_length")
1171+
)
10881172
@replace_return_docstrings(output_type=QuestionAnsweringModelOutput, config_class=_CONFIG_FOR_DOC)
10891173
def forward(
10901174
self,
@@ -1214,7 +1298,9 @@ def __init__(self, config):
12141298

12151299
self.init_weights()
12161300

1217-
@add_start_docstrings_to_model_forward(LAYOUTLMV3_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1301+
@add_start_docstrings_to_model_forward(
1302+
LAYOUTLMV3_DOWNSTREAM_INPUTS_DOCSTRING.format("batch_size, sequence_length")
1303+
)
12181304
@replace_return_docstrings(output_type=SequenceClassifierOutput, config_class=_CONFIG_FOR_DOC)
12191305
def forward(
12201306
self,

0 commit comments

Comments
 (0)