Fine tuning TensorFlow DeBERTa fails on TPU

### System Info

Latest version of transformers, Colab TPU, tensorflow 2.

- Colab TPU
- transformers: 4.21.0
- tensorflow: 2.8.2 / 2.6.2
- Python 3.7

### Who can help?

@LysandreJik, @Rocketknight1, @san

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

I am facing some issues while trying to fine-tune a TensorFlow DeBERTa model ``microsoft/deberta-v3-base`` on TPU.

I have created some Colab notebooks showing the errors. Note, the second and third notebooks already include some measures to circumvent previous errors.

- ValueError with partially known TensorShape with latest ``take_along_axis`` change: [FineTuning_TF_DeBERTa_TPU_1](https://colab.research.google.com/drive/1TN4Ro-U6a-7MypDN3AUoHFfEPnFErPBt?usp=sharing)
- Output shape mismatch of branches with custom dropout: [FineTuning_TF_DeBERTa_TPU_2](https://colab.research.google.com/drive/1gubIwNKNFwexKcra37w9-CSzFJUDGm07?usp=sharing)
- XLA compilation error because of dynamic/computed tensor shapes: [FineTuning_TF_DeBERTa_TPU_3](https://colab.research.google.com/drive/1L6cCdYCf3R5l90TK-Hs5dv85O6qL5vrR?usp=sharing)

I have seen similar issues when using ``microsoft/deberta-base``.

I believe the following issues are related:
- [TF2 DeBERTaV2 runs super slow on TPUs #18239](https://github.com/huggingface/transformers/issues/18239)
- [Debertav2 debertav3 TPU : socket closed #18276](https://github.com/huggingface/transformers/issues/18276). From this I used the fix on ``take_along_axis``.

Thanks!


### Expected behavior

Fine tuning is possible as it happens when using a GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine tuning TensorFlow DeBERTa fails on TPU #18476

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fine tuning TensorFlow DeBERTa fails on TPU #18476

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions