Add the huggingface token parameter, and modify the file path in llama.cpp repo #761
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I want to use the mistralai/Mistral-7B-Instruct-v0.2 model, and found there are no gguf files in HuggingFace, then I decided to use the ./convert_models functions to convert the model. I found there are some issues exist:
So I added the optional HF_TOKEN=<YOUR_HF_TOKEN_ID> parameter in the code. If the users want to download the public model, there is no token needed; If the users want to download the private model, they need to use the huggingface token;
Impacted files: README.md, download_huggingface.py, run.sh
If we go to https://github.com/ggerganov/llama.cpp.git, we can find the convert.py has been deprecated and moved to examples/convert_legacy_llama.py. I am not sure if I should just keep the line "python llama.cpp/convert-hf-to-gguf.py /opt/app-root/src/converter/converted_models/$hf_model_url", I just replace the convert.py with the correct path. also for llama.cpp/quantize
Impacted file: run.sh
So I added "converter" in the "podman run" command.
Impacted file: ui.py
Here is my testing after the modification:
Here is the UI web testing with a public model(no token is needed)
