Missing allocatable memory explanation #1216

cubed-it · 2019-09-18T11:46:59Z

What happened:
Describe node prints 4017088Ki capacity and allocatable states 2200512Ki.
So 45% are reserved.

What you expected to happen:
Find a answear how allocatable memory is determined like here https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture?hl=de#memory_cpu

Anything else we need to know?:
I would also like to know why AKS provides nearly 25% less memory on my 4GB VM then GKE?

Environment:

Kubernetes version (use kubectl version): 1.14.6
Size of cluster (how many worker nodes are in the cluster?) 3xB2s

The text was updated successfully, but these errors were encountered:

jpoizat · 2019-09-27T07:18:50Z

I don't know where this is coming from but EKS is much less hungry on memory :
Capacity:
...
memory: 16038616Ki
...
Allocatable:
...
memory: 15219416Ki

versus AKS:
Capacity:
...
memory: 16403296Ki
...
Allocatable:
...
memory: 12909408Ki

onybo · 2019-10-24T10:42:40Z

We see similar behaviour as @cubed-it .

From my very limited numbers it could look like using small nodes in AKS results in a huge waste of memory resources 45%.
Also looks like it gets better with node size
All my numbers are from kubetctl describe node <node name>

Cluster 1:
capacity/allocatable: 4016988Ki/2200412Ki
Reserved memory?: 1816576Ki or 45%
version: 1.15.3

Cluster 2:
capacity/allocatable: 8145760Ki/5490528Ki
Reserved memory?: 2655232Ki or 33%
version: 1.15.3

Including @jpoizat's numbers for comparison:
capacity/allocatable: 16403296Ki/12909408Ki
Reserved memory?: 3493888Ki or 21%
version: ?

jpoizat · 2019-10-24T11:54:58Z

There is a document explaining how it is done :
https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads#resource-reservations

but agreed, the memory % reserved on smaller node is high...

mimckitt · 2019-11-13T18:20:03Z

Adding @jluk @sauryadas @palma21

MarcosMMartinez · 2019-12-02T04:02:10Z

@palma21 - any input here? I have an internal inquiry coming you way soon - it relates to this.

jluk · 2019-12-02T22:50:55Z

There has since been a v2 addition to the linked document describing the memory reservations.
I believe this should complete this open issue but defer to @MicahMcKittrick-MSFT

Memory - reserved memory includes the sum of two values
The kubelet daemon is installed on all Kubernetes agent nodes to manage container creation and termination. By default on AKS, this daemon has the following eviction rule: memory.available<750Mi, which means a node must always have at least 750 Mi allocatable at all times. When a host is below that threshold of available memory, the kubelet will terminate one of the running pods to free memory on the host machine and protect it.

The second value is a progressive rate of memory reserved for the kubelet daemon to properly function (kube-reserved).

25% of the first 4 GB of memory
20% of the next 4 GB of memory (up to 8 GB)
10% of the next 8 GB of memory (up to 16 GB)
6% of the next 112 GB of memory (up to 128 GB)
2% of any memory above 128 GB

MarcosMMartinez · 2019-12-02T23:02:01Z

@jluk, it seems we're asserting the 750Mi isn't allocatable, though? My calculations also show the same.

In other words, it seems like we're specifying this value against --kube-reserved:

Ref: https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable

Timvissers · 2020-04-02T15:28:16Z

How comes the big difference with other cloud providers?

Pamir · 2020-04-12T21:54:45Z

Hi @Timvissers

If you can shell into kubernetes worker nodes you will see the actual reservation in terms of MBs.

you should use krew plugin with node-shell, to shell a worker vm

k node-shell xxxx
ps -ef | grep kubelet

see

--kube-reserved=cpu=100m,memory=1638Mi

depending on SKUs in every cloud vendor, it should be documented.

neoGeneva · 2020-04-13T00:22:42Z

It looks like the ~25% difference on 4Gi nodes between AKS and GKE is the eviction threshold, it's 750Mi for AKS and 100Mi on GKE (according to the docs anyhow).

I couldn't find the exact numbers for EKS, but it looks like they allow customization of kubelet config, so it's effectively whatever you like. I see there's a feature request for that here too #323, and I think that'd be a good way to allow people to customize the settings for what's appropriate for their low memory setups.

github-actions · 2020-07-21T01:33:31Z

Action required from @Azure/aks-pm

palma21 · 2020-07-21T05:43:46Z

I'm closing this issue as the doc clarification was added to the doc.

In terms of explanation why the difference I can't speak by the other cloud providers as I don't have visibility on the workloads and customers there, AKS is fairly conservative in what regards to protecting the cluster against "rogue" or misbehaved workloads, which have caused a lot of issues in the past when workloads can race faster for resources than even cgroups and slices can account for and we needed a larger buffer.

This, we acknowledge, can penalize well behaved workloads and users that would otherwise benefit from quite lenient default reservations. We didn't took this decision lightly but on the account of hundreds of cases where we saw these issues, and so far this has been a trade-off of running in this managed service scenario.

Nonetheless we're working on providing:

The ability to have lower reservation on User Pools vs System pools. https://docs.microsoft.com/en-us/azure/aks/use-system-pools
Considering the possibility to preview kubelet customizations as asked on the above item. There will always be a support tradeoff on these cases.

Until then, and if you're using multiple nodepools, you can already workaround this by a applying a similar daemon set as below to your User Pools.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    component: ds-reserve
  name: ds-reserve
  namespace: kube-system
spec:
  selector:
    matchLabels:
      component: ds-reserve
      tier: node
  template:
    metadata:
      labels:
        component: ds-reserve
        tier: node
    spec:
      containers:
      - command:
        - nsenter
        - --target
        - "1"
        - --mount
        - --uts
        - --ipc
        - --net
        - --pid
        - --
        - sh
        - -c
        - |
          sed -i 's/--kube-reserved=\S*/--kube-reserved=cpu=100m,memory=897Mi/' /etc/default/kubelet
          sed -i 's/--eviction-hard=\S*/--eviction-hard=memory.available<100Mi/' /etc/default/kubelet
          systemctl daemon-reload
          systemctl restart kubelet
          while true; do sleep 100000; done
        image: alpine
        imagePullPolicy: IfNotPresent
        name: ds-reserve
        resources:
          requests:
            cpu: 10m
        securityContext:
          privileged: true
      dnsPolicy: ClusterFirst
      hostPID: true
      tolerations:
      - effect: NoSchedule
        operator: Exists
      restartPolicy: Always
      nodeSelector:
        kubernetes.azure.com/mode: user
  updateStrategy:
    type: RollingUpdate

ghost added the triage label Sep 18, 2019

ondrejhlavacek mentioned this issue Dec 20, 2019

Provide more flexible resource reservations for User Node Pools #1339

Closed

github-actions bot added the action-required label Jul 21, 2020

ghost removed the triage label Jul 21, 2020

palma21 closed this as completed Jul 21, 2020

ghost locked as resolved and limited conversation to collaborators Aug 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing allocatable memory explanation #1216

Missing allocatable memory explanation #1216

cubed-it commented Sep 18, 2019

jpoizat commented Sep 27, 2019

onybo commented Oct 24, 2019

jpoizat commented Oct 24, 2019

mimckitt commented Nov 13, 2019

MarcosMMartinez commented Dec 2, 2019

jluk commented Dec 2, 2019

MarcosMMartinez commented Dec 2, 2019

Timvissers commented Apr 2, 2020

Pamir commented Apr 12, 2020

neoGeneva commented Apr 13, 2020

github-actions bot commented Jul 21, 2020

palma21 commented Jul 21, 2020 •

edited

Loading

Missing allocatable memory explanation #1216

Missing allocatable memory explanation #1216

Comments

cubed-it commented Sep 18, 2019

jpoizat commented Sep 27, 2019

onybo commented Oct 24, 2019

jpoizat commented Oct 24, 2019

mimckitt commented Nov 13, 2019

MarcosMMartinez commented Dec 2, 2019

jluk commented Dec 2, 2019

MarcosMMartinez commented Dec 2, 2019

Timvissers commented Apr 2, 2020

Pamir commented Apr 12, 2020

neoGeneva commented Apr 13, 2020

github-actions bot commented Jul 21, 2020

palma21 commented Jul 21, 2020 • edited Loading

palma21 commented Jul 21, 2020 •

edited

Loading