Skip to content

LayoutLMv3Model output shape is different #17833

Closed
@pocca2048

Description

@pocca2048

System Info

- `transformers` version: 4.20.1
- Platform: Linux-4.4.0-62-generic-x86_64-with-glibc2.10
- Python version: 3.8.8
- Huggingface_hub version: 0.2.1
- PyTorch version (GPU?): 1.10.0+cu102 (True)
- Tensorflow version (GPU?): 2.7.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

@NielsRogge

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoProcessor, AutoModelForTokenClassification, AutoModel
from datasets import load_dataset

processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base", apply_ocr=False)
model = AutoModel.from_pretrained("microsoft/layoutlmv3-base", num_labels=7)

dataset = load_dataset("nielsr/funsd-layoutlmv3", split="train")
example = dataset[0]
image = example["image"]
words = example["tokens"]
boxes = example["bboxes"]
word_labels = example["ner_tags"]

encoding = processor(image, words, boxes=boxes, return_tensors="pt")

outputs = model(**encoding)
encoding.input_ids.shape, outputs.last_hidden_state.shape

outputs

(torch.Size([1, 208]), torch.Size([1, 405, 768]))

Expected behavior

(torch.Size([1, 208]), torch.Size([1, 208, 768]))

Hi! Thank you very much for contributing layoutlmv3 model to huggingface.

While using the model, I think I found out that the model has some different parts from the specs.

https://github.com/huggingface/transformers/blob/v4.20.1/src/transformers/models/layoutlmv3/modeling_layoutlmv3.py#L1043
https://github.com/microsoft/unilm/blob/master/layoutlmv3/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py#L1070

This huggingface implementation has different output shape than original implementation.
In the documentation, it says last_hidden_state has shape of (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) but it does not. (original implementation has, but huggingface implementation does not)
Sequence length includes
(Presumably because of that) It makes different training result on FUNSD dataset.

In summary,

  • LayoutLMv3Model outputs different shape (sequence length) than that is written in documentation.
  • and that is different from the original implementation.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions