|
54 | 54 | behavior.
|
55 | 55 |
|
56 | 56 | Parameters:
|
57 |
| - config ([`LayoutLMv2Config`]): Model configuration class with all the parameters of the model. |
| 57 | + config ([`LayoutLMv3Config`]): Model configuration class with all the parameters of the model. |
58 | 58 | Initializing with a config file does not load the weights associated with the model, only the
|
59 | 59 | configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
|
60 | 60 | """
|
61 | 61 |
|
62 |
| -LAYOUTLMV3_INPUTS_DOCSTRING = r""" |
| 62 | +LAYOUTLMV3_MODEL_INPUTS_DOCSTRING = r""" |
63 | 63 | Args:
|
64 |
| - input_ids (`torch.LongTensor` of shape `{0}`): |
| 64 | + input_ids (`torch.LongTensor` of shape `({0})`): |
65 | 65 | Indices of input sequence tokens in the vocabulary.
|
66 | 66 |
|
67 |
| - Indices can be obtained using [`LayoutLMv2Tokenizer`]. See [`PreTrainedTokenizer.encode`] and |
| 67 | + Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS] |
| 68 | + token. See `pixel_values` for `patch_sequence_length`. |
| 69 | +
|
| 70 | + Indices can be obtained using [`LayoutLMv3Tokenizer`]. See [`PreTrainedTokenizer.encode`] and |
| 71 | + [`PreTrainedTokenizer.__call__`] for details. |
| 72 | +
|
| 73 | + [What are input IDs?](../glossary#input-ids) |
| 74 | +
|
| 75 | + bbox (`torch.LongTensor` of shape `({0}, 4)`, *optional*): |
| 76 | + Bounding boxes of each input sequence tokens. Selected in the range `[0, |
| 77 | + config.max_2d_position_embeddings-1]`. Each bounding box should be a normalized version in (x0, y0, x1, y1) |
| 78 | + format, where (x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, |
| 79 | + y1) represents the position of the lower right corner. |
| 80 | +
|
| 81 | + Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS] |
| 82 | + token. See `pixel_values` for `patch_sequence_length`. |
| 83 | +
|
| 84 | + pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`): |
| 85 | + Batch of document images. Each image is divided into patches of shape `(num_channels, config.patch_size, |
| 86 | + config.patch_size)` and the total number of patches (=`patch_sequence_length`) equals to `((height / |
| 87 | + config.patch_size) * (width / config.patch_size))`. |
| 88 | +
|
| 89 | + attention_mask (`torch.FloatTensor` of shape `({0})`, *optional*): |
| 90 | + Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: |
| 91 | +
|
| 92 | + - 1 for tokens that are **not masked**, |
| 93 | + - 0 for tokens that are **masked**. |
| 94 | +
|
| 95 | + Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS] |
| 96 | + token. See `pixel_values` for `patch_sequence_length`. |
| 97 | +
|
| 98 | + [What are attention masks?](../glossary#attention-mask) |
| 99 | + token_type_ids (`torch.LongTensor` of shape `({0})`, *optional*): |
| 100 | + Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, |
| 101 | + 1]`: |
| 102 | +
|
| 103 | + - 0 corresponds to a *sentence A* token, |
| 104 | + - 1 corresponds to a *sentence B* token. |
| 105 | +
|
| 106 | + Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS] |
| 107 | + token. See `pixel_values` for `patch_sequence_length`. |
| 108 | +
|
| 109 | + [What are token type IDs?](../glossary#token-type-ids) |
| 110 | + position_ids (`torch.LongTensor` of shape `({0})`, *optional*): |
| 111 | + Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, |
| 112 | + config.max_position_embeddings - 1]`. |
| 113 | +
|
| 114 | + Note that `sequence_length = token_sequence_length + patch_sequence_length + 1` where `1` is for [CLS] |
| 115 | + token. See `pixel_values` for `patch_sequence_length`. |
| 116 | +
|
| 117 | + [What are position IDs?](../glossary#position-ids) |
| 118 | + head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*): |
| 119 | + Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`: |
| 120 | +
|
| 121 | + - 1 indicates the head is **not masked**, |
| 122 | + - 0 indicates the head is **masked**. |
| 123 | +
|
| 124 | + inputs_embeds (`torch.FloatTensor` of shape `({0}, hidden_size)`, *optional*): |
| 125 | + Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This |
| 126 | + is useful if you want more control over how to convert *input_ids* indices into associated vectors than the |
| 127 | + model's internal embedding lookup matrix. |
| 128 | + output_attentions (`bool`, *optional*): |
| 129 | + Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned |
| 130 | + tensors for more detail. |
| 131 | + output_hidden_states (`bool`, *optional*): |
| 132 | + Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for |
| 133 | + more detail. |
| 134 | + return_dict (`bool`, *optional*): |
| 135 | + Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. |
| 136 | +""" |
| 137 | + |
| 138 | +LAYOUTLMV3_DOWNSTREAM_INPUTS_DOCSTRING = r""" |
| 139 | + Args: |
| 140 | + input_ids (`torch.LongTensor` of shape `({0})`): |
| 141 | + Indices of input sequence tokens in the vocabulary. |
| 142 | +
|
| 143 | + Indices can be obtained using [`LayoutLMv3Tokenizer`]. See [`PreTrainedTokenizer.encode`] and |
68 | 144 | [`PreTrainedTokenizer.__call__`] for details.
|
69 | 145 |
|
70 | 146 | [What are input IDs?](../glossary#input-ids)
|
|
76 | 152 | y1) represents the position of the lower right corner.
|
77 | 153 |
|
78 | 154 | pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
|
79 |
| - Batch of document images. |
| 155 | + Batch of document images. Each image is divided into patches of shape `(num_channels, config.patch_size, |
| 156 | + config.patch_size)` and the total number of patches (=`patch_sequence_length`) equals to `((height / |
| 157 | + config.patch_size) * (width / config.patch_size))`. |
80 | 158 |
|
81 |
| - attention_mask (`torch.FloatTensor` of shape `{0}`, *optional*): |
| 159 | + attention_mask (`torch.FloatTensor` of shape `({0})`, *optional*): |
82 | 160 | Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
|
83 | 161 |
|
84 | 162 | - 1 for tokens that are **not masked**,
|
85 | 163 | - 0 for tokens that are **masked**.
|
86 | 164 |
|
87 | 165 | [What are attention masks?](../glossary#attention-mask)
|
88 |
| - token_type_ids (`torch.LongTensor` of shape `{0}`, *optional*): |
| 166 | + token_type_ids (`torch.LongTensor` of shape `({0})`, *optional*): |
89 | 167 | Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
|
90 | 168 | 1]`:
|
91 | 169 |
|
92 | 170 | - 0 corresponds to a *sentence A* token,
|
93 | 171 | - 1 corresponds to a *sentence B* token.
|
94 | 172 |
|
95 | 173 | [What are token type IDs?](../glossary#token-type-ids)
|
96 |
| - position_ids (`torch.LongTensor` of shape `{0}`, *optional*): |
| 174 | + position_ids (`torch.LongTensor` of shape `({0})`, *optional*): |
97 | 175 | Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
|
98 | 176 | config.max_position_embeddings - 1]`.
|
99 | 177 |
|
|
104 | 182 | - 1 indicates the head is **not masked**,
|
105 | 183 | - 0 indicates the head is **masked**.
|
106 | 184 |
|
107 |
| - inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): |
| 185 | + inputs_embeds (`torch.FloatTensor` of shape `({0}, hidden_size)`, *optional*): |
108 | 186 | Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
|
109 | 187 | is useful if you want more control over how to convert *input_ids* indices into associated vectors than the
|
110 | 188 | model's internal embedding lookup matrix.
|
@@ -763,7 +841,9 @@ def forward_image(self, pixel_values):
|
763 | 841 |
|
764 | 842 | return embeddings
|
765 | 843 |
|
766 |
| - @add_start_docstrings_to_model_forward(LAYOUTLMV3_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) |
| 844 | + @add_start_docstrings_to_model_forward( |
| 845 | + LAYOUTLMV3_MODEL_INPUTS_DOCSTRING.format("batch_size, token_sequence_length") |
| 846 | + ) |
767 | 847 | @replace_return_docstrings(output_type=BaseModelOutput, config_class=_CONFIG_FOR_DOC)
|
768 | 848 | def forward(
|
769 | 849 | self,
|
@@ -975,7 +1055,9 @@ def __init__(self, config):
|
975 | 1055 |
|
976 | 1056 | self.init_weights()
|
977 | 1057 |
|
978 |
| - @add_start_docstrings_to_model_forward(LAYOUTLMV3_INPUTS_DOCSTRING.format("batch_size, sequence_length")) |
| 1058 | + @add_start_docstrings_to_model_forward( |
| 1059 | + LAYOUTLMV3_DOWNSTREAM_INPUTS_DOCSTRING.format("batch_size, sequence_length") |
| 1060 | + ) |
979 | 1061 | @replace_return_docstrings(output_type=TokenClassifierOutput, config_class=_CONFIG_FOR_DOC)
|
980 | 1062 | def forward(
|
981 | 1063 | self,
|
@@ -1084,7 +1166,9 @@ def __init__(self, config):
|
1084 | 1166 |
|
1085 | 1167 | self.init_weights()
|
1086 | 1168 |
|
1087 |
| - @add_start_docstrings_to_model_forward(LAYOUTLMV3_INPUTS_DOCSTRING.format("batch_size, sequence_length")) |
| 1169 | + @add_start_docstrings_to_model_forward( |
| 1170 | + LAYOUTLMV3_DOWNSTREAM_INPUTS_DOCSTRING.format("batch_size, sequence_length") |
| 1171 | + ) |
1088 | 1172 | @replace_return_docstrings(output_type=QuestionAnsweringModelOutput, config_class=_CONFIG_FOR_DOC)
|
1089 | 1173 | def forward(
|
1090 | 1174 | self,
|
@@ -1214,7 +1298,9 @@ def __init__(self, config):
|
1214 | 1298 |
|
1215 | 1299 | self.init_weights()
|
1216 | 1300 |
|
1217 |
| - @add_start_docstrings_to_model_forward(LAYOUTLMV3_INPUTS_DOCSTRING.format("batch_size, sequence_length")) |
| 1301 | + @add_start_docstrings_to_model_forward( |
| 1302 | + LAYOUTLMV3_DOWNSTREAM_INPUTS_DOCSTRING.format("batch_size, sequence_length") |
| 1303 | + ) |
1218 | 1304 | @replace_return_docstrings(output_type=SequenceClassifierOutput, config_class=_CONFIG_FOR_DOC)
|
1219 | 1305 | def forward(
|
1220 | 1306 | self,
|
|
0 commit comments