-
Notifications
You must be signed in to change notification settings - Fork 126
Converting and Quantizing The Models
Start by downloading either the 2B or 6B GPT-J versions of CodeGen.
You could also experiment with the other sizes of model such as 16B if you want or try the mono models (2B, 6B, 16B) which are fine-tuned on python only but which outperform the multi
models in some cases (see the original paper for details).
You will also need to place vocab.json and added_tokens.json in the directory along with the model to make the conversion script work. This is a temporary limitation that I'll remove at some point.
You can directly git clone
from huggingface URLS above. To save time you can disable LFS on first checkout and selectively pull the files you need (you only need the .bin
files for conversion. The large .zst
files are not needed). Here is an example:
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/moyix/codegen-16B-multi-gptj
git config lfs.fetchexclude "*.zst"
git lfs fetch
git lfs checkout *.bin
The convert-codegen-to-ggml.py
requires Python 3 - I used 3.10
. Install the dependencies with pip install -r requirements.txt
.
python convert-codegen-to-ggml.py ./codegen-6B-multi-gptj 0
Note: You may find in some cases that the system does not automatically load sharded models (the ones that have multiple pytorch_model-x-of-y.bin
files). You can use the script described here to pre-combine the model into a single.bin file before running the conversion script.
You will need to build the C++ project and make codegen-quantize
then you can run the following:
./ggml/build/bin/codegen-quantize ./codegen-6B-multi-gptj/ggml-model-f32.bin ./codegen-6B-multi-gptj/ggml-model-quant.bin 2