-
Notifications
You must be signed in to change notification settings - Fork 324
Provide more flexible resource reservations for User Node Pools #1339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it. |
Looks like this is an AKS cluster. Transferring this issue there in case other users have similar behaviors. |
@ritazh Oh, I'm sorry, thanks! |
possibly related to #1216 and probably not happened during the period i have indicated |
Hey, I'm also having problems due to reduced memory on nodes. I have two single node clusters on different versions of K8S, both have nodes with 4017088Ki capacity, the v1.10.3 clusters node has 3092416Ki allocatable but the v1.14.8 cluster only has 2200480Ki. Looks like both the kube reserved and the eviction hard limit has increased, the older cluster has As mentioned in #1216 having 45% reserved is pretty restrictive, is there any chance on having these values tweaked for low memory nodes? |
Its outlined here I believe: https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads#resource-reservations
Seems a bit obscene really. Certainly something to factor in when weighing the real comparative costs between node sizes. |
Action required from @Azure/aks-pm |
Action required from @Azure/aks-pm |
Dropping the same info from the linked issue. I will leave this issue open as the feature request for less aggressive reservations on User pools.
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
component: ds-reserve
name: ds-reserve
namespace: kube-system
spec:
selector:
matchLabels:
component: ds-reserve
tier: node
template:
metadata:
labels:
component: ds-reserve
tier: node
spec:
containers:
- command:
- nsenter
- --target
- "1"
- --mount
- --uts
- --ipc
- --net
- --pid
- --
- sh
- -c
- |
sed -i 's/--kube-reserved=\S*/--kube-reserved=cpu=100m,memory=897Mi/' /etc/default/kubelet
sed -i 's/--eviction-hard=\S*/--eviction-hard=memory.available<100Mi/' /etc/default/kubelet
systemctl daemon-reload
systemctl restart kubelet
while true; do sleep 100000; done
image: alpine
imagePullPolicy: IfNotPresent
name: ds-reserve
resources:
requests:
cpu: 10m
securityContext:
privileged: true
dnsPolicy: ClusterFirst
hostPID: true
tolerations:
- effect: NoSchedule
operator: Exists
restartPolicy: Always
nodeSelector:
kubernetes.azure.com/mode: user
updateStrategy:
type: RollingUpdate |
@palma21 will the daemonset change result in an unsupported cluster as per the shared responsibilites doc? The limits really need to get revised or priced into the VMs as 34% loss of ram is severe. |
@stl327 to comment on this and own it. |
AKS has released updated logic to our memory reservations for kube-reserved and the eviction threshold. These optimizations will increase the allocatable space for application workloads by up to 20%. Currently this applies for AKS 1.28. For more information, please see: https://learn.microsoft.com/en-us/azure/aks/concepts-clusters-workloads#resource-reservations |
Running a staging AKS cluster with 3 Standard B2s nodes.
Between 22/11/19 and 26/11/19 deployments stopped working. The new pods are in Pending state saying
0/3 nodes are available: 3 Insufficient memory.
.I'd swear nothing else changed on our side, but I have no factual proofs, apart from a successful deploy pipeline from 22/11/19. Number of pods is still the same, that hasn't changed. I was able to run 2-3 times more pods on the same cluster previously. I vaguely remember that the nodes were waiting for a restart after a security/kernel update.
The current value of Capacity and Allocatable:
Is there any chance the allocatable memory dropped after kernel update (which may have caused eg. aks-engine update?)
Thanks!
The text was updated successfully, but these errors were encountered: