Research: mmap eviction

### Research Stage

- [x] Background Research (Let's try to avoid reinventing the wheel)
- [ ] Hypothesis Formed (How do you think this will work and it's effect?)
- [ ] Strategy / Implementation Forming
- [ ] Analysis of results
- [ ] Debrief / Documentation (So people in the future can learn from us)

### Previous existing literature and research

_No response_

### Hypothesis

I'm loading a large model into a large amount of GPU memory with some CPU offload. The GPU memory exceeds system memory.

GPU Memory: 196 GB
CPU Memory: 148 GB
Model Size: 220 GB

I've noticed that when the model size exceeds system memory, mmap seemingly has no effect on load times. Whereas when it's within system memory size, the load time is nearly immediate.

I suspect that since the model is being loaded deterministically/sequentially, the mapped file is also being deterministically evicted just prior to it being needed for the load onto GPU.

I suspect loading the large weights in reverse inference order would significantly alleviate this to avoid the deterministic mmap eviction in kernel.

I'm looking for some confirmation from a maintainer that my hypothesis may be correct.

### Implementation

_No response_

### Analysis

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Research: mmap eviction #14154

Research Stage

Previous existing literature and research

Hypothesis

Implementation

Analysis

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Research: mmap eviction #14154

Description

Research Stage

Previous existing literature and research

Hypothesis

Implementation

Analysis

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions