Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix validation of worker topology names in Cluster resource #12069

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dlipovetsky
Copy link
Contributor

What this PR does / why we need it:
The worker topology name is used to generate the name of a Kubernetes resource (MachineDelpoyment or MachinePool), and must therefore be a valid Kubernetes resource name. The existing validation does not ensure this.

The first commit adds tests; without a fix, they fail, as expected.
The second commit fixes the validation.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #12068

/area clusterclass

…etes resource name

The worker topology name is used to generate the name of a Kubernetes
resource (MachineDelpoyment or MachinePool), and must therefore be a
valid Kubernetes resource name.
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/clusterclass Issues or PRs related to clusterclass cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 7, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 7, 2025
@dlipovetsky
Copy link
Contributor Author

When I authored the fix, I noticed a disagreement between the validation and the API types. I will keep this as a draft PR until we resolve the disagreement.

The API types say that a worker topology Name may be up to 255 characters:-

However, both MachineDeployment and MachinePool validation limits the name to 63 characters, because it checks that the name is a valid label value:

My draft PR replaces this value check with a stricter one, but the length is not changed.

As an aside, I noticed that our validation imposes this 63 character limit on the Cluster and MachineDeployment names:

@sbueringer
Copy link
Member

@dlipovetsky

When I authored the fix, I noticed a disagreement between the validation and the API types. I will keep this as a draft PR until we resolve the disagreement.

When I introduced MaxLength on the field I missed that we already had that validation in the webhook. So I assumed that we have to keep supporting longer strings.

The MD webhook was not a factor there because we limit the name actually used for the MD here:

name, err := topologynames.MachineDeploymentNameGenerator(nameTemplate, s.Current.Cluster.Name, machineDeploymentTopology.Name).GenerateName()

As we already had that validation in the webhook. Let's reduce the MaxLengths accordingly

@dlipovetsky dlipovetsky marked this pull request as ready for review April 8, 2025 16:36
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2025
@@ -310,7 +310,8 @@ func MachineDeploymentTopologiesAreValidAndDefinedInClusterClass(desired *cluste
machineDeploymentClasses := mdClassNamesFromWorkerClass(clusterClass.Spec.Workers)
names := sets.Set[string]{}
for i, md := range desired.Spec.Topology.Workers.MachineDeployments {
if errs := validation.IsValidLabelValue(md.Name); len(errs) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a label value is a DNS1123Label prefixed optionally by a DNS1123Subdomain and a /.

We are asserting then that no one has used the prefix since / is not a valid character in a CR name?

Any value in adding a ratcheting validation here to be safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a label value is a DNS1123Label prefixed optionally by a DNS1123Subdomain and a /.

I think this is incorrect. I created https://go.dev/play/p/NJ9uywgss1N to demonstrate.

Are you thinking of a label key?

Copy link
Member

@sbueringer sbueringer Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I see correctly this PR goes from

[a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')]

to

[a lowercase RFC 1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is 'a-z0-9?')]

Which seems okay given that the MD/MP names will have to pass through the latter validation anyway?

Copy link
Member

@sbueringer sbueringer Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. @dlipovetsky I think this is maybe the wrong one. I tested this

k apply -f ./test.yaml
The MachineDeployment "capiTest" is invalid: metadata.name: Invalid value: "capiTest": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is 'a-z0-9?(.a-z0-9?)*')

Should we use IsDNS1123Subdomain instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking of a label key?

Yes I was, ignore me!

Should we use IsDNS1123Subdomain instead?

Subdomain doesn't allow underscores? Which I think the current validation does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more tests to reflect the above. Now we test that validation fails if

  • name is longer than the longest allowed label value
  • name is not a valid label value
  • name is not a valid resource name

I also updated the implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you validated that the max length was 63 chars (could do this at the API schema level), then you don't need the IsValidLabelValue check at all. The regex for DNS subdomain is a subset (the same but don't allow _), isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you validated that the max length was 63 chars (could do this at the API schema level)

That's exactly what I did, just in a separate PR: #12072

you don't need the IsValidLabelValue check at all ... The regex for DNS subdomain is a subset

That's true.

a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character
a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character

I think having both IsDNS1123Subdomain and IsValidLabelValue in code communicates the intent of the code clearly. But I agree that, in practice, IsValidLabelValue would always return true, because any invalid character would be caught by IsDNS1123Subdomain, and an invalid length would be caught by the CRD validation.

Also, if we check for max length in the CRD validation, then to test this function, we need the API server, and must use envtest.

If you feel strongly, I can remove the IsValidLabelValue call, but in that case, I would like to replace it with an ad-hoc length check in the code, so we can continue to unit test the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion, but maybe worth a code comment to explain that we know it's redundant.

I do wonder of the performance impact 🤔

Over time, once the CEL format libraries are in our minimum supported version, I suspect we can rip all of this out and use the CEL format to validate that this is a DNS1123 subdomain and be done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth a code comment to explain that we know it's redundant.

Comment added.

…bernetes resource names

Use IsDNS1123Subdomain
…bernetes resource names

Check that Name is both a valid Kubernetes resource name, and a valid label value
… Kubernetes resource name

Add tests for maximum length and invalid characters in a label value
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 9, 2025
… Kubernetes resource name

Test for max length should fail due to max length, not due to uppercase characters
…bernetes resource names

Explain why we use IsValidLabelValue check
… Kubernetes resource name

Add tests to check package
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster validation allows invalid worker topology names
4 participants