arg : allow using -hf offline #13202

ngxson · 2025-04-29T22:40:35Z

Motivation

My resident is currently having no internet (temporary), and I'm using a slow 4G to upload this PR.

This PR allows using -hf and -mu without internet access, given you already downloaded the model.

If the model is not yet download, or the manifest file is not yet generated (which does not exist before this PR), then you will see this error:

error: failed to get manifest: error: cannot make GET request: Couldn't resolve host name
try reading from cache
error: failed to get manifest (check your internet connection)

Behavior change

2 noticeable things:

HEAD request now doesn't allow retry. This is because if we force the user to wait for 3 retries, it will be a bad UX for offline usage. Not sure if this will impact anyone, but I hope this will be a big problem (see next point)
If HEAD request fails, but the file does exist, we won't re-download it. The argument is that if the server does not support ETag on HEAD request, there is no point of forcing user to re-download the file every time.

Idea for the future

While making this PR, I intentionally add a manifest= prefix to the cached manifest file.

In the future, we can have a flag like --list-cached-models to show the list of cached models that user can use.

In a far future, we can also allow llama-server to swap models (not necessarily running 2 or more in parallel). Think of it like the use case of LM Studio where you can load 1 model at a time. The manifest file provided by this PR can allow listing available models in cache, ready to be loaded.

arg : allow using -hf offline

afc2b90

ngxson requested a review from ggerganov April 29, 2025 22:40

ggerganov approved these changes Apr 30, 2025

View reviewed changes

ggerganov mentioned this pull request Apr 30, 2025

Offline Model Loading When Previously Downloaded via llama-server fails ggml-org/llama.vscode#23

Open

add more comments in code [no ci]

833d467

ngxson merged commit 5933e6f into ggml-org:master Apr 30, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arg : allow using -hf offline #13202

arg : allow using -hf offline #13202

ngxson commented Apr 29, 2025

arg : allow using -hf offline #13202

arg : allow using -hf offline #13202

Conversation

ngxson commented Apr 29, 2025

Motivation

Behavior change

Idea for the future