Converting and Quantizing The Models

Start by downloading either the 2B or 6B GPT-J versions of CodeGen.

You could also experiment with the other sizes of model such as 16B if you want or try the mono models (2B, 6B, 16B) which are fine-tuned on python only but which outperform the multi models in some cases (see the original paper for details).

You will also need to place vocab.json and added_tokens.json in the directory along with the model to make the conversion script work. This is a temporary limitation that I'll remove at some point.

You can directly git clone from huggingface URLS above. To save time you can disable LFS on first checkout and selectively pull the files you need (you only need the .bin files for conversion. The large .zst files are not needed). Here is an example:

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/moyix/codegen-16B-multi-gptj
git config lfs.fetchexclude "*.zst"
git lfs fetch
git lfs checkout *.bin

Install Python Dependencies

The convert-codegen-to-ggml.py requires Python 3 - I used 3.10. Install the dependencies with pip install -r requirements.txt.

Convert The Model

python convert-codegen-to-ggml.py ./codegen-6B-multi-gptj 0

Note: You may find in some cases that the system does not automatically load sharded models (the ones that have multiple pytorch_model-x-of-y.bin files). You can use the script described here to pre-combine the model into a single.bin file before running the conversion script.

Quantize the Model

You will need to build the C++ project and make codegen-quantize then you can run the following:

./ggml/build/bin/codegen-quantize ./codegen-6B-multi-gptj/ggml-model-f32.bin ./codegen-6B-multi-gptj/ggml-model-quant.bin 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting and Quantizing The Models

Install Python Dependencies

Convert The Model

Quantize the Model

Clone this wiki locally