-
Notifications
You must be signed in to change notification settings - Fork 324
Missing allocatable memory explanation #1216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't know where this is coming from but EKS is much less hungry on memory : versus AKS: |
We see similar behaviour as @cubed-it . From my very limited numbers it could look like using small nodes in AKS results in a huge waste of memory resources 45%. Cluster 1: Cluster 2: Including @jpoizat's numbers for comparison: |
There is a document explaining how it is done : but agreed, the memory % reserved on smaller node is high... |
Adding @jluk @sauryadas @palma21 |
@palma21 - any input here? I have an internal inquiry coming you way soon - it relates to this. |
There has since been a v2 addition to the linked document describing the memory reservations.
|
@jluk, it seems we're asserting the 750Mi isn't allocatable, though? My calculations also show the same. In other words, it seems like we're specifying this value against --kube-reserved: Ref: https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable |
How comes the big difference with other cloud providers? |
Hi @Timvissers If you can shell into kubernetes worker nodes you will see the actual reservation in terms of MBs. you should use krew plugin with node-shell, to shell a worker vm k node-shell xxxx
ps -ef | grep kubelet see
depending on SKUs in every cloud vendor, it should be documented. |
It looks like the ~25% difference on 4Gi nodes between AKS and GKE is the eviction threshold, it's 750Mi for AKS and 100Mi on GKE (according to the docs anyhow). I couldn't find the exact numbers for EKS, but it looks like they allow customization of kubelet config, so it's effectively whatever you like. I see there's a feature request for that here too #323, and I think that'd be a good way to allow people to customize the settings for what's appropriate for their low memory setups. |
Action required from @Azure/aks-pm |
I'm closing this issue as the doc clarification was added to the doc. In terms of explanation why the difference I can't speak by the other cloud providers as I don't have visibility on the workloads and customers there, AKS is fairly conservative in what regards to protecting the cluster against "rogue" or misbehaved workloads, which have caused a lot of issues in the past when workloads can race faster for resources than even cgroups and slices can account for and we needed a larger buffer. This, we acknowledge, can penalize well behaved workloads and users that would otherwise benefit from quite lenient default reservations. We didn't took this decision lightly but on the account of hundreds of cases where we saw these issues, and so far this has been a trade-off of running in this managed service scenario. Nonetheless we're working on providing:
Until then, and if you're using multiple nodepools, you can already workaround this by a applying a similar daemon set as below to your User Pools. apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
component: ds-reserve
name: ds-reserve
namespace: kube-system
spec:
selector:
matchLabels:
component: ds-reserve
tier: node
template:
metadata:
labels:
component: ds-reserve
tier: node
spec:
containers:
- command:
- nsenter
- --target
- "1"
- --mount
- --uts
- --ipc
- --net
- --pid
- --
- sh
- -c
- |
sed -i 's/--kube-reserved=\S*/--kube-reserved=cpu=100m,memory=897Mi/' /etc/default/kubelet
sed -i 's/--eviction-hard=\S*/--eviction-hard=memory.available<100Mi/' /etc/default/kubelet
systemctl daemon-reload
systemctl restart kubelet
while true; do sleep 100000; done
image: alpine
imagePullPolicy: IfNotPresent
name: ds-reserve
resources:
requests:
cpu: 10m
securityContext:
privileged: true
dnsPolicy: ClusterFirst
hostPID: true
tolerations:
- effect: NoSchedule
operator: Exists
restartPolicy: Always
nodeSelector:
kubernetes.azure.com/mode: user
updateStrategy:
type: RollingUpdate |
What happened:
Describe node prints 4017088Ki capacity and allocatable states 2200512Ki.
So 45% are reserved.
What you expected to happen:
Find a answear how allocatable memory is determined like here https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture?hl=de#memory_cpu
Anything else we need to know?:
I would also like to know why AKS provides nearly 25% less memory on my 4GB VM then GKE?
Environment:
kubectl version
): 1.14.6The text was updated successfully, but these errors were encountered: