Karpenter OutOfMemory frequency hike when several NodeClaims are in Unknown status

### Description

**Observed Behavior**:
We noticed during burst scheduling, Karpenter's memory usage will abnormally spike at first but will settle down as instances get provisioned. In our situation, we are bursting from 20 --> 1500 nodes and 800 --> 18000 pods. Our Karpenter is provisioned with 12Gi of memory requests/limits.

When turning on memory profiling, the memory spike led to the following flow: 
<img width="163" height="915" alt="Image" src="https://github.com/user-attachments/assets/35218e49-3c4b-43ec-ae0a-00b8396dbe94" />

During the initial phase of scheduling, hundreds of NodeClaim objects will initially remain in an **Unknown** state. From my understanding, Karpenter will continuously look for NodeClaims that aren't fully initialized and cache the full NodeClaim objects for up to 1 minute (which was recently further bumped to 1 hour):
- https://github.com/kubernetes-sigs/karpenter/blob/v1.3.3/pkg/controllers/nodeclaim/lifecycle/launch.go#L69 

Since the full NodeClaim object is cached for each NodeClaim, the size of the cache can grow quite quickly and cause Karpenter to crash unless the memory size is larger than usual. When Karpenter recovers, the controller's cache will be refreshed, will likely loop through some of the NodeClaims that were already processed, be re-cached, and the cycle repeats after Karpenter crashes again. Several NodeClaims may never get the chance to be processed for quite some time, although eventually they will get processed as other NodeClaims start to initialize.

As the cache depletes, we then see Karpenter's memory start to stabilize.

<br />

**Expected Behavior**:
I might be misunderstanding the code, and will be happy to be corrected! Some observations:
- I am wondering if it is feasible to cache parts of the NodeClaim rather than the full object as this seems to fill up memory quite quickly. 
- I am also wondering if it is feasible to evict fully initialized NodeClaims from the cache now that the upcoming Karpenter will store objects for up to 1 hour.

I understand this might not be feasible if it brings unnecessary complexity. Alternatively, I am wondering:
- For us, burst scheduling is awkward in that memory needs to be bumped up at the time of scheduling because of the cache size but can almost immediately be bumped down after all of the nodes have been initialized. Have others experienced this issue and how did they seem to get around it? (e.g. VPA)
- Would there happen to be benchmarks by the Karpenter team against scheduling several instances at once?

<br />

**Reproduction Steps** (Please include YAML):
This really can be any nodepool/ec2nodeclass. We opted for a relatively small instance size requirement on the nodepool.

We can use simple pause pods that will schedule to different nodes and just scale a bunch of them (e.g. 1500) at the same time:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 3000
        fsGroup: 2000
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: inflate 
              topologyKey: "kubernetes.io/hostname"
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            limits:
              memory: 512Mi
            requests:
              cpu: 100m
              memory: 512Mi
          securityContext:
            allowPrivilegeEscalation: false
```


**Versions**:
- Chart Version: v1.3.3
- Kubernetes Version (`kubectl version`): v1.32

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Karpenter OutOfMemory frequency hike when several NodeClaims are in Unknown status #2358

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Karpenter OutOfMemory frequency hike when several NodeClaims are in Unknown status #2358

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions