Skip to content

Node re-registration required to update the node labels #29019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 29, 2021
Merged

Node re-registration required to update the node labels #29019

merged 1 commit into from
Oct 29, 2021

Conversation

SergeyKanzhelev
Copy link
Member

Added a note about the node labels.

/sig node
/kind cleanup

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language labels Jul 19, 2021
@k8s-ci-robot k8s-ci-robot added sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 19, 2021
@netlify
Copy link

netlify bot commented Jul 19, 2021

✔️ Deploy Preview for kubernetes-io-main-staging ready!

🔨 Explore the source changes: 40b013b

🔍 Inspect the deploy log: https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/617b85ee16e7ab000740c845

😎 Browse the preview: https://deploy-preview-29019--kubernetes-io-main-staging.netlify.app

Comment on lines +103 to +107
As mentioned in the [Node name uniqueness](#node-name-uniqueness) section,
when Node configuration needs to be updated, it is a good practice to re-register
the node with the API server. For example, if the kubelet being restarted with
the new set of `--node-labels`, but the same Node name is used, the change will
not take an effect, as labels are being set on the Node registration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this limitation (eg information security)? I'm wondering how come a running kubelet isn't able to relabel the Node that represents it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the history. I think it is to avoid situation when pods get incorrectly-scheduled due to taints on the kubelet restart. Draining and re-adding node works around this issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think adding the reasoning in the text will be helpful for readers?

@SergeyKanzhelev
Copy link
Member Author

@sftim I added a reason why re-registration is needed. PTAL if you have time

the node with the API server. For example, if the kubelet being restarted with
the new set of `--node-labels`, but the same Node name is used, the change will
not take an effect, as labels are being set on the Node registration. One of
the reasons for this behavior is that changing labels on Kublet restart may
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: “Kublet” should be “kubelet”.

This'll need a tech review I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

the node with the API server. For example, if the kubelet being restarted with
the new set of `--node-labels`, but the same Node name is used, the change will
not take an effect, as labels are being set on the Node registration. One of
the reasons for this behavior is that changing labels on Kublet restart may
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the reasons for this behavior is that changing labels on Kublet restart may
the reasons for this behavior is that changing labels on kubelet restart may

@tengqm
Copy link
Contributor

tengqm commented Oct 25, 2021

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tengqm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 25, 2021
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly LGTM, with one caveat.

Comment on lines 107 to 115
not take an effect, as labels are being set on the Node registration. One of
the reasons for this behavior is that changing labels on kubelet restart may
lead to situation when Pods are already scheduled on the Node, while they
have taints to the modified labels set preventing them from being on that node.
{{< /note >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons for this behavior is that changing labels on kubelet restart may lead to situation when Pods are already scheduled on the Node, while they have taints to the modified labels set preventing them from being on that node.

I'd omit this. A full discussion probably wants to mention things like the descheduler. Better (IMO) not to touch on the topic at all.

There's probably an opportunity for anyone with the appetite for it, to improve how we manage label changes for kubelets that do stay in service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the good practice here to delete the Node and then register it again? Or to change the name that the kubelet would register as? (I wasn't sure).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd omit this.

I started with this. And it raised questions. It may be helpful to explain some reasoning behind it, this is where the language "one of the reasons" come from. Do you feel strongly about removing it?

Is the good practice here to delete the Node and then register it again? Or to change the name that the kubelet would register as? (I wasn't sure).

If one desires to match the node name to machine name for easier discoverability, it would make sense to re-use the name. Different name would be easier to see the change and not confuse things like logs or metrics. So I don't think there is a clear cut best practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's not remove that text. But I would want to reword it - rather than framing this as explaining the motivation, explain the problem that a cluster operator could encounter if they were unaware.

(do we have an example of how it could go wrong?)

If it's really tricky to explain, we could track a new issue about making a Task page, eg “Change the Labels That Apply to a Node”. I'm leaning towards that but not convinced it's needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrased. Good idea!

@sftim
Copy link
Contributor

sftim commented Oct 29, 2021

This looks right to me. Thanks @SergeyKanzhelev and @tengqm
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2021
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 118fb33d26f880ab991d174fdef163dac5bea753

@k8s-ci-robot k8s-ci-robot merged commit 3dd978e into kubernetes:main Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants