LayoutLMv3Model output shape is different

### System Info

```shell
- `transformers` version: 4.20.1
- Platform: Linux-4.4.0-62-generic-x86_64-with-glibc2.10
- Python version: 3.8.8
- Huggingface_hub version: 0.2.1
- PyTorch version (GPU?): 1.10.0+cu102 (True)
- Tensorflow version (GPU?): 2.7.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
```


### Who can help?

@NielsRogge 

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```python
from transformers import AutoProcessor, AutoModelForTokenClassification, AutoModel
from datasets import load_dataset

processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base", apply_ocr=False)
model = AutoModel.from_pretrained("microsoft/layoutlmv3-base", num_labels=7)

dataset = load_dataset("nielsr/funsd-layoutlmv3", split="train")
example = dataset[0]
image = example["image"]
words = example["tokens"]
boxes = example["bboxes"]
word_labels = example["ner_tags"]

encoding = processor(image, words, boxes=boxes, return_tensors="pt")

outputs = model(**encoding)
encoding.input_ids.shape, outputs.last_hidden_state.shape
```
outputs
```
(torch.Size([1, 208]), torch.Size([1, 405, 768]))
```

### Expected behavior
```
(torch.Size([1, 208]), torch.Size([1, 208, 768]))
```

Hi! Thank you very much for contributing layoutlmv3 model to huggingface.

While using the model, I think I found out that the model has some different parts from the specs.

https://github.com/huggingface/transformers/blob/v4.20.1/src/transformers/models/layoutlmv3/modeling_layoutlmv3.py#L1043
https://github.com/microsoft/unilm/blob/master/layoutlmv3/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py#L1070

This huggingface implementation has different output shape than original implementation.
In the [documentation](https://huggingface.co/docs/transformers/v4.20.1/en/model_doc/layoutlmv3#transformers.LayoutLMv3Model), it says last_hidden_state has shape of `(torch.FloatTensor of shape (batch_size, sequence_length, hidden_size))` but it does not. (original implementation has, but huggingface implementation does not)
Sequence length includes 
(Presumably because of that) It makes different training result on FUNSD dataset.

In summary,
- `LayoutLMv3Model` outputs different shape (sequence length) than that is written in documentation.
- and that is different from the original implementation.

Thank you.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LayoutLMv3Model output shape is different #17833

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LayoutLMv3Model output shape is different #17833

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions