Closed
Description
System Info
transformers=4.20.1, python=3.7, tensorflow-gpu=2.9.1.
Who can help?
@Rocketknight1 @sgugger @patil-suraj
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- Download bert-large tf2 pretrained models, .h5 file from huggingface.
- Run pytorch question-answering example for reference. My script is:
python run_qa.py \
--model_name_or_path $BERT_DIR \
--dataset_name $SQUAD_DIR \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 128 \
--doc_stride 48 \
--output_dir $OUTPUT \
--save_steps 10000 \
--overwrite_cache \
I got an f1-score 90.3953%. Note that the pretrained model are loaded from tf2 .h5 checkpoints.
- Run tf2 question-answering example with the same setting of pytorch example. My script is:
python run_qa.py \
--model_name_or_path $BERT_DIR \
--dataset_name $SQUAD_DIR \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 128 \
--doc_stride 48 \
--output_dir $OUTPUT \
--save_steps 10000 \
--overwrite_cache \
I only got an f1-score 88.5672%, which is much lower than expected and pytorch results (90.3953%).
Expected behavior
TF2 question-answering example should achieve similar f1-score with results of corresponding pytroch example.
Or you guys could provide script examples that achieves the target F1-score. Thanks.