This is a project that won the first prize at kt-dev-challenge held in September 2022.
Overall, we used the T5 model provided by KT, but we would like to inform you that we cannot disclose some models and datasets due to confidentiality.
git clone https://github.com/kkjsw17/KU-NLP-kt-dev-challenge-2022.git
cd KU-NLP-kt-dev-challenge-2022
pip install -r ./assets/requirements.txt
preprocessor.py
- To use T5 Encoder only model, the sentence must be changed to a tensor.
- This is used to transform from sentence to tensor.
python preprocessor.py
train.py
- This is used to train T5 encoder-decoder model.
- Cannot disclose train code for encoder only models due to confidentiality.
sh train.sh
search_hyperparams.py
- This is used to search hyperparameters to optimize model.
- Hyperparameter search borrowed the idea of the grid search method. search_ensemble_f1.py
- This is used to search optimal combination.
- The method searching best combination used the idea of the hard-voting.
python search_hyperparams.py
python search_ensemble_f1.py
infer.py and infer_encoder.py
- This is used to infer results using dataset of test.
- infer.py is used to infer results using T5 encoder-decoder model
- infer_encoder.py is used to infer results using T5 encoder only model. But due to confidentiality, this code is not perfect code.
sh infer.sh
sh infer_encoder.sh
We won the first prize by this project, and T5 Encoder only model has better result than T5 Encoder-Decoder model