Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

always enable ClusterTrusbBundles feature gates in tests that both focus it and enable all alpha APIs #34451

Closed
wants to merge 2 commits into from

Conversation

stlaz
Copy link
Member

@stlaz stlaz commented Mar 4, 2025

The alpha-features test enables all the APIs but as ClusterTrustBundles move to beta, we explicitly need to enable these FGs.

fixes tests in kubernetes/kubernetes#128499

The PR also explicitly adds CTB featuregates to jobs that focus the CTB tests and enable all alpha APIs.

/cc @enj @liggitt

@k8s-ci-robot k8s-ci-robot requested a review from enj March 4, 2025 09:56
@k8s-ci-robot
Copy link
Contributor

@stlaz: GitHub didn't allow me to request PR reviews from the following users: liggit.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

The alpha-features test enables all the APIs but as ClusterTrustBundles move to beta, we explicitly need to enable these FGs.

fixes tests in kubernetes/kubernetes#128499

/cc @enj @LiGgit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/config Issues or PRs related to code in /config area/jobs area/provider/gcp Issues or PRs related to gcp provider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Mar 4, 2025
The alpha-features test enables all the APIs but as ClusterTrustBundles
move to beta, we explicitly need to enable these FGs.
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: stlaz
Once this PR has been reviewed and has the lgtm label, please assign cheftako, dchen1107, jeremyrickard for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/release-eng Issues or PRs related to the Release Engineering subproject sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/release Categorizes an issue or PR as relevant to SIG Release. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 4, 2025
@stlaz stlaz changed the title gce-alpha-features: always enable ClusterTrusbBundles feature gates always enable ClusterTrusbBundles feature gates in tests that focus it and enable all alpha APIs Mar 4, 2025
@stlaz stlaz changed the title always enable ClusterTrusbBundles feature gates in tests that focus it and enable all alpha APIs always enable ClusterTrusbBundles feature gates in tests that both focus it and enable all alpha APIs Mar 4, 2025
@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2025
@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

/assign @aojea

@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

/assign @dims

@pohly
Copy link
Contributor

pohly commented Mar 4, 2025

I haven't sent that email which is in that doc because I was hoping to get feedback from @BenTheElder first.

The underlying idea isn't new. We just need to continue with converting jobs.

@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

Thanks for weighing in. I agree on that direction and will be happy to ack splitting / adjusting these jobs in the future. Would you agree that until we get there, this PR is a reasonable step to unblock the promotion of the existing focus tests?

@BenTheElder
Copy link
Member

The alpha-features test enables all the APIs but as ClusterTrustBundles move to beta, we explicitly need to enable these FGs.

Huh? Beta features should not be covered by alpha jobs.
Are these off-by-default betas that are required to alpha test? Why?

I think we want to keep these ci jobs to reflect all switching on the alpha flags on with all the default true beta feature gates. we probably need to stand up additional jobs or look at other existing jobs instead of messing with these alpha jobs.

Agreed.

If you need a CI job to cover tests for these, it should be stood up. The shared alpha features job (and beta and ..) just ensures that we haven't totally broken the cluster when these are on, it may or may not provide coverage and we generally expect features with special requirements to stand up their own CI.

@BenTheElder
Copy link
Member

I see the failures in the jobs from the linked PR:
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/128499/pull-kubernetes-e2e-gce-cos-alpha-features/1891460005454942208

We should probably not be running [Feature:ClusterTrustBundle] in this job, these tests should've been tagged with Alpha, Feature is for things that require additional setup (like a loadbalancer controller) and we should not skip Alpha but we shouldn't run all Feature or manually list features.

@BenTheElder
Copy link
Member

For comparison: the kind alpha jobs will not run [Feature] tag

@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

If you need a CI job to cover tests for these, it should be stood up

If we want to block this on moving that direction, ok... just wanting to understand what we're telling feature authors to do for their tests. Will we ack standup of a new e2e job for every distinct feature? Can we actually afford that?

@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

We should probably not be running [Feature:ClusterTrustBundle] in this job, these tests should've been tagged with Alpha

Ok, we can stage that change as a first PR to master. Do we already have jobs that will work like we want in place:

  • alpha e2e job that enables all alpha+beta APIs+alpha gates and runs all alpha-tagged e2es (excluding specific feature-tagged tests)?
  • beta e2e job that enables all beta APIs+beta gates and runs all beta-tagged e2es (excluding specific feature-tagged tests)?

If so, I think that would work pretty seamlessly for any features that didn't require special setup beyond API and gate enablement

@BenTheElder
Copy link
Member

If we want to block this on moving that direction, ok... just wanting to understand what we're telling feature authors to do for their tests. Will we ack standup of a new e2e job for every distinct feature? Can we actually afford that?

Er, with the rest of the context though, you generally don't need one, because it's only gated by the gate, and not on some additional setup (which would be an arbitrarily named Feature).

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-testing/e2e-tests.md#kinds-of-tests

cc @pohly @aojea, we should finish the transition to labels instead of regex and make sure we opt in/out of the alpha label appropriately.

@BenTheElder
Copy link
Member

Ok, we can stage that change as a first PR to master. Do we already have jobs that will work like we want in place:

  • alpha e2e job that enables all alpha APIs+gates and runs all alpha-tagged e2es (excluding specific feature-tagged tests)?
  • beta e2e job that enables all beta APIs+gates and runs all beta-tagged e2es (excluding specific feature-tagged tests)?

I'm talking to aojea about that more when he's back (he's out today), I think the answer is "it's a mess but that's roughly what we should have"

If so, I think that would work pretty seamlessly for any features that didn't require special setup beyond API and gate enablement

Right, I'm also reviewing the doc linked above.

We don't want SIGs to have to setup jobs for every feature, just the ones that require special setup, though we do expect feature developers to ensure that coverage is happening (instead of just assuming someone else checked this for them) even though we do plan to work on improving the state of alpha/beta jobs.

The responsibility for ensuring that tests for your features run needs to be shared I think, but that doesn't mean we should spin up a job for every feature, just taking the time to check that it is and work with us if it's not (like here!)

That test case doesn't seem to be feature-gate tagged currently, let's fix that, and look at correcting the jobs to filter on the alpha/beta labels correctly.

@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-testing/e2e-tests.md#kinds-of-tests
...
That test case doesn't seem to be feature-gate tagged currently, let's fix that, and look at correcting the jobs to filter on the alpha/beta labels correctly.

ok, opened kubernetes/kubernetes#130567 which I think does what is expected

That doc also says "Note that at the moment, not all jobs filter out tests with Alpha or Beta requirements like that. Therefore all tests with such a requirement also have to be annotated with a [Feature] tag. This restriction will be lifted once migration of jobs to --filter-label is completed." ... so... I'm not sure if kubernetes/kubernetes#130567 will break or be blocked by e2e job adjustments.

I'm very sensitive to the timing of this and the runway to 1.33 code freeze... I want to move test config in the right direction but I can't afford to block folks on this either. If we can resolve this the "right" way ~today, let's do that, and if not, can we proceed with this PR?

@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

@pohly
Copy link
Contributor

pohly commented Mar 4, 2025

we should finish the transition to labels instead of regex and make sure we opt in/out of the alpha label appropriately

Which I have been trying to push forward, with mixed results, mostly because I can't do it alone. The email I mentioned is a more public call to action for an issue that will soon be one year old.

If we can resolve this the "right" way ~today

We can't go that fast. Defining a test with only [FeatureGate:Something] and [Alpha] or [Beta] (done automatically by e2e/framework.WithFeatureGate) will cause it to run in jobs which might not have that feature gate enabled and the test will fail, causing the job to break.

For now, adding [Feature:Something] and configuring jobs accordingly is still needed.

kubernetes/kubernetes#130567

That PR shows what we want at the end and can't have right now, as the failures in https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/130567/pull-kubernetes-e2e-gce/1896997483054108672 show.

@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

For now, adding [Feature:Something] and configuring jobs accordingly is still needed.

Are you suggesting they edit the alpha and beta jobs definitions to enable and focus on tests for features in alpha and beta, or that they spin up dedicated jobs?

@BenTheElder
Copy link
Member

For now, adding [Feature:Something] and configuring jobs accordingly is still needed.

We should either:

  • use a standardized [Feature:Alpha] automatically as a bridge / stop-gap
  • just finish switching to label filter for critical shared jobs and PSA the org about the action required to update any other jobs (and if nobody response then ... those jobs must not be maintained)

Which I have been trying to push forward, with mixed results, mostly because I can't do it alone. The email I mentioned is a more public call to action for an issue that will soon be one year old.

I'm not suggesting that the few of us here (or you specifically) should do this for all of CI, but we can do it for release / PR blocking and rip the bandaid off for the rest after PSA-ing the org. (This is also why I think we should do smaller, targeted PSAs, since this one would be [Action Required])

@BenTheElder
Copy link
Member

use a standardized [Feature:Alpha] automatically as a bridge / stop-gap

... I actually thought we'd planned something like this with the label transition already, but I see we only do like [FeatureGate:ClusterTrustBundle] [Alpha]

I actually think we could bulk update jobs that are not obviously testing alpha/beta features to exclude FeatureGate even without switching to the labels ...?

@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

  • use a standardized [Feature:Alpha] automatically as a bridge / stop-gap

Something like that seems plausible to me for features which don't have any special requirements other than "enable alpha apis and gates". Could the same thing work for beta?

I tweaked kubernetes/kubernetes#130567 in that direction and opened a POC stacked follow-up that promotes to beta at kubernetes/kubernetes#130572

If we can edit job definitions of the generic alpha + beta jobs to get them green on both of those PRs without dropping test coverage, I'd suggest we merge those changes

@@ -907,7 +907,7 @@ presubmits:
kubetest2 ec2 \
--stage https://dl.k8s.io/ci/fast/ \
--version $VERSION \
--feature-gates="AllAlpha=true,EventedPLEG=false,StorageVersionAPI=true,APIServerIdentity=true" \
--feature-gates="AllAlpha=true,ClusterTrustBundle=true,ClusterTrustBundleProjection=true,EventedPLEG=false,StorageVersionAPI=true,APIServerIdentity=true" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead set skip-regex to have [Beta] as a comparable stopgap, combined with kubernetes/kubernetes#130567 ?
I don't think the intent should be to run these tests in the alpha job as a beta feature

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about the [Alpha] or [Beta] tagging since that will change selection of existing tests using featuregates in ways I'm unsure of

Using generic "Feature:Alpha" and "Feature:Beta" is net-new more incremental step I could reason about better ... I opened https://github.com/kubernetes/test-infra/pull/34456/files to add those focus / skips to the generic jobs I could find that were currently explicitly referencing the ClusterTrustBundle feature as a way to straddle onto making kubernetes/kubernetes#130572 and kubernetes/kubernetes#130567 work without dropping test coverage

Copy link
Contributor

@pohly pohly Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a standardized [Feature:Alpha] automatically as a bridge / stop-gap

We have that, just not for focus/skip. It was meant to be used with --label-filter. kubernetes/kubernetes#130580 (opened today as alternative to kubernetes/kubernetes#130567) makes it available also to jobs using focus/filter.

just finish switching to label filter for critical shared jobs and PSA the org about the action required to update any other jobs (and if nobody response then ... those jobs must not be maintained)

I'm arriving at a similar conclusion. I was even considering to convert critical jobs to --label-filter and then label feature-gate-only tests so that they run in unconverted jobs. If they then fail, job owners either (finally) take notice or they don't and we can consider deleting the jobs. But with kubernetes/kubernetes#130580 such unconverted jobs wouldn't break like that - or rather, not break at all.

This may be good (no breakage) and bad (they don't get updated and we continue to depend on regexp in those jobs).

@SergeyKanzhelev
Copy link
Member

/assign

@SergeyKanzhelev SergeyKanzhelev moved this from Triage to PRs - Needs Reviewer in SIG Node CI/Test Board Mar 5, 2025
@SergeyKanzhelev SergeyKanzhelev moved this from PRs - Needs Reviewer to PRs - Needs Approver in SIG Node CI/Test Board Mar 5, 2025
@stlaz
Copy link
Member Author

stlaz commented Mar 11, 2025

/close
closing in favour of #34470 and kubernetes/kubernetes#130655

@k8s-ci-robot
Copy link
Contributor

@stlaz: Closed this PR.

In response to this:

/close
closing in favour of #34470 and kubernetes/kubernetes#130655

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@github-project-automation github-project-automation bot moved this from PRs - Needs Approver to Done in SIG Node CI/Test Board Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/config Issues or PRs related to code in /config area/jobs area/provider/gcp Issues or PRs related to gcp provider area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

8 participants