Skip to content

Could I get safe tensor without lazy loading? #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks
voidxb opened this issue Feb 24, 2025 · 1 comment
Open
2 tasks

Could I get safe tensor without lazy loading? #577

voidxb opened this issue Feb 24, 2025 · 1 comment

Comments

@voidxb
Copy link

voidxb commented Feb 24, 2025

System Info

I see safe_open and deserialize, it seems that both two are lazy loading.
So if I don't want to load safetensor without lazy loading
how could I do, thanks

Information

  • The official example scripts
  • My own modified scripts

Reproduction

I use sglang, and in sglang model_loader/weight_utils.py
it load safetensors like this
if not is_all_weights_sharded: with safe_open(st_file, framework="pt") as f: for name in f.keys(): # noqa: SIM118 param = f.get_tensor(name) yield name, param else: result = load_file(st_file, device="cpu") for name, param in result.items(): yield name, param
I found it loads safe tensor too slow(about 20min+), whether is_all_weights_sharded is True
and if I prefetch safetensors before load_model(like cat * > /dev/null), it could only cost 5min
I try to use threadExecutor to parallel this code, and although get_tensor could be quick, but loading weight still cost 20min +, so I doubt that lazy loading.thanks

Expected behavior

without lazy loading

@Narsil
Copy link
Collaborator

Narsil commented Mar 13, 2025

load(open(filename, "r").read()).

This has been extensively discussed in many places so I won't repeat everything.
You're using some kind of network disk access using a mount point. The OS uses memory mapping assuming the underlying data is an actual disk therefore doing suboptimal decisions. Just force load everything in one go.

Having lazy-loading using shared memory is what makes safetensors fast in the first place when using real disks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants