Skip to content

arg : allow using -hf offline #13202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 30, 2025
Merged

arg : allow using -hf offline #13202

merged 2 commits into from
Apr 30, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Apr 29, 2025

Motivation

Fix #13128

My resident is currently having no internet (temporary), and I'm using a slow 4G to upload this PR.

This PR allows using -hf and -mu without internet access, given you already downloaded the model.

If the model is not yet download, or the manifest file is not yet generated (which does not exist before this PR), then you will see this error:

error: failed to get manifest: error: cannot make GET request: Couldn't resolve host name
try reading from cache
error: failed to get manifest (check your internet connection)

Behavior change

2 noticeable things:

  • HEAD request now doesn't allow retry. This is because if we force the user to wait for 3 retries, it will be a bad UX for offline usage. Not sure if this will impact anyone, but I hope this will be a big problem (see next point)
  • If HEAD request fails, but the file does exist, we won't re-download it. The argument is that if the server does not support ETag on HEAD request, there is no point of forcing user to re-download the file every time.

Idea for the future

While making this PR, I intentionally add a manifest= prefix to the cached manifest file.

In the future, we can have a flag like --list-cached-models to show the list of cached models that user can use.

In a far future, we can also allow llama-server to swap models (not necessarily running 2 or more in parallel). Think of it like the use case of LM Studio where you can load 1 model at a time. The manifest file provided by this PR can allow listing available models in cache, ready to be loaded.

@ngxson ngxson requested a review from ggerganov April 29, 2025 22:40
@ngxson ngxson merged commit 5933e6f into ggml-org:master Apr 30, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Allow -hf to be used offline
2 participants