Offload layers to an SSD without swap #12507

ejrydhfs · 2025-03-22T01:50:30Z

ejrydhfs
Mar 22, 2025

It would be slow, yes. But would it be possible to offload layers to an SSD or disk using llama.cpp without using swap? Ideally implementing the LLM in a Flash paper: https://arxiv.org/abs/2312.11514 even without implementing that paper, and with only an algorithm that only reads sequentially from an SSD, offloading to an SSD without swap could reduce the number of writes in an SSD compared to using swap.

ejrydhfs · 2025-03-22T03:54:16Z

ejrydhfs
Mar 22, 2025
Author

i have seen that this is not currently possible so i have changed the category to ideas

i have also changed the title from "Is it possible to offload layers to an SSD"

0 replies

ejrydhfs · 2025-03-22T07:41:44Z

ejrydhfs
Mar 22, 2025
Author

I also wonder what the effects would be if the model is split vertically across layers instead of being split one layer at the time, or if it's split diagonally across layers in several ways like a partition line being fully across the main diagonal of the neural net as if the nodes of the net were elements of a 2D matrix, partially across with the rest of the partition line being horizontal or vertical or even diagonal at a different angle, different partition line angles, although a better approach could be something like powerinfer but modified so that some part of the model would be kept in RAM and the rest in an SSD

1 reply

ejrydhfs Apr 20, 2025
Author

From my understanding there are several kinds of layers on AI models maybe it would be possible to offload by type of layer instead of powerinfer

ejrydhfs · 2025-03-23T04:46:06Z

ejrydhfs
Mar 23, 2025
Author

I also wonder if it would be possible to do something inspired by RAID, in which a single GGUF file would be split into several with one file per SSD to increase speed, without the need to actually set up RAID.

0 replies

Johnreidsilver · 2025-03-25T10:47:44Z

Johnreidsilver
Mar 25, 2025

Something bypassing the CPU/RAM, like Nvidia's GPUdirect?
https://developer.nvidia.com/blog/gpudirect-storage/
https://docs.nvidia.com/gpudirect-storage/index.html

1 reply

ejrydhfs Mar 29, 2025
Author

Yes that would be ideal. Even better if it behaves the same way for the CPU, NPU and non-Nvidia GPUs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offload layers to an SSD without swap #12507

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Offload layers to an SSD without swap #12507

ejrydhfs Mar 22, 2025

Replies: 4 comments · 2 replies

ejrydhfs Mar 22, 2025 Author

ejrydhfs Mar 22, 2025 Author

ejrydhfs Apr 20, 2025 Author

ejrydhfs Mar 23, 2025 Author

Johnreidsilver Mar 25, 2025

ejrydhfs Mar 29, 2025 Author

ejrydhfs
Mar 22, 2025

Replies: 4 comments 2 replies

ejrydhfs
Mar 22, 2025
Author

ejrydhfs
Mar 22, 2025
Author

ejrydhfs Apr 20, 2025
Author

ejrydhfs
Mar 23, 2025
Author

Johnreidsilver
Mar 25, 2025

ejrydhfs Mar 29, 2025
Author