Description
System Info
- `transformers` version: 4.20.1
- Platform: Linux-4.4.0-62-generic-x86_64-with-glibc2.10
- Python version: 3.8.8
- Huggingface_hub version: 0.2.1
- PyTorch version (GPU?): 1.10.0+cu102 (True)
- Tensorflow version (GPU?): 2.7.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers import AutoProcessor, AutoModelForTokenClassification, AutoModel
from datasets import load_dataset
processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base", apply_ocr=False)
model = AutoModel.from_pretrained("microsoft/layoutlmv3-base", num_labels=7)
dataset = load_dataset("nielsr/funsd-layoutlmv3", split="train")
example = dataset[0]
image = example["image"]
words = example["tokens"]
boxes = example["bboxes"]
word_labels = example["ner_tags"]
encoding = processor(image, words, boxes=boxes, return_tensors="pt")
outputs = model(**encoding)
encoding.input_ids.shape, outputs.last_hidden_state.shape
outputs
(torch.Size([1, 208]), torch.Size([1, 405, 768]))
Expected behavior
(torch.Size([1, 208]), torch.Size([1, 208, 768]))
Hi! Thank you very much for contributing layoutlmv3 model to huggingface.
While using the model, I think I found out that the model has some different parts from the specs.
https://github.com/huggingface/transformers/blob/v4.20.1/src/transformers/models/layoutlmv3/modeling_layoutlmv3.py#L1043
https://github.com/microsoft/unilm/blob/master/layoutlmv3/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py#L1070
This huggingface implementation has different output shape than original implementation.
In the documentation, it says last_hidden_state has shape of (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size))
but it does not. (original implementation has, but huggingface implementation does not)
Sequence length includes
(Presumably because of that) It makes different training result on FUNSD dataset.
In summary,
LayoutLMv3Model
outputs different shape (sequence length) than that is written in documentation.- and that is different from the original implementation.
Thank you.