Skip to content

KEP-3619: update Test Plan and Graduation Criteria for KEP freeze #3862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 9, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 153 additions & 37 deletions keps/sig-node/3619-supplemental-groups-policy/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,35 @@
# KEP-3619: Fine-grained SupplementalGroups control

<!--
**Note:** When your KEP is complete, all of these comment blocks should be removed.
To get started with this template:
- [ ] **Pick a hosting SIG.**
Make sure that the problem space is something the SIG is interested in taking
up. KEPs should not be checked in without a sponsoring SIG.
- [ ] **Create an issue in kubernetes/enhancements**
When filing an enhancement tracking issue, please make sure to complete all
fields in that template. One of the fields asks for a link to the KEP. You
can leave that blank until this KEP is filed, and then go back to the
enhancement and add the link.
- [ ] **Make a copy of this template directory.**
Copy this template into the owning SIG's directory and name it
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
leading-zero padding) assigned to your enhancement above.
- [ ] **Fill out as much of the kep.yaml file as you can.**
At minimum, you should fill in the "Title", "Authors", "Owning-sig",
"Status", and date-related fields.
- [ ] **Fill out this file as best you can.**
At minimum, you should fill in the "Summary" and "Motivation" sections.
These should be easy if you've preflighted the idea of the KEP with the
appropriate SIG(s).
- [ ] **Create a PR for this KEP.**
Assign it to people in the SIG who are sponsoring this process.
- [ ] **Merge early and iterate.**
Avoid getting hung up on specific details and instead aim to get the goals of
the KEP clarified and merged quickly. The best way to do this is to just
start with the high-level sections and fill out details incrementally in
subsequent PRs.
Just because a KEP is merged does not mean it is complete or approved. Any KEP
marked as `provisional` is a working document and subject to change. You can
denote sections that are under active debate as follows:
Expand All @@ -10,6 +39,41 @@ denote sections that are under active debate as follows:
Stuff that is being argued.
<<[/UNRESOLVED]>>
```
When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions
focused. If you disagree with what is already in a document, open a new PR
with suggested changes.
One KEP corresponds to one "feature" or "enhancement" for its whole lifecycle.
You do not need a new KEP to move from beta to GA, for example. If
new details emerge that belong in the KEP, edit the KEP. Once a feature has become
"implemented", major changes should get new KEPs.
The canonical place for the latest set of instructions (and the likely source
of this file) is [here](/keps/NNNN-kep-template/README.md).
**Note:** Any PRs to move a KEP to `implementable`, or significant changes once
it is marked `implementable`, must be approved by each of the KEP approvers.
If none of those approvers are still appropriate, then changes to that list
should be approved by the remaining approvers and/or the owning SIG (or
SIG Architecture for cross-cutting KEPs).
-->
# KEP-3619: Fine-grained SupplementalGroups control

<!--
This is the title of your KEP. Keep it short, simple, and descriptive. A good
title can help communicate what the KEP is and should be considered as part of
any review.
-->

<!--
A table of contents is helpful for quickly jumping to sections of a KEP and for
highlighting any additional information provided beyond the standard KEP
template.
Ensure the TOC is wrapped with
<code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
tags, and then generate with `hack/update-toc.sh`.
-->

<!-- toc -->
Expand All @@ -27,7 +91,7 @@ Stuff that is being argued.
- [CRI](#cri)
- [SupplementalGroupsPolicy in SecurityContext](#supplementalgroupspolicy-in-securitycontext)
- [user in ContainerStatus](#user-in-containerstatus-1)
- [User Stories](#user-stories)
- [User Stories (Optional)](#user-stories-optional)
- [Story 1: Deploy a Security Policy to enforce <code>SupplementalGroupsPolicy</code> field](#story-1-deploy-a-security-policy-to-enforce--field)
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Risks and Mitigations](#risks-and-mitigations)
Expand All @@ -44,6 +108,9 @@ Stuff that is being argued.
- [Integration tests](#integration-tests)
- [e2e tests](#e2e-tests)
- [Graduation Criteria](#graduation-criteria)
- [Alpha](#alpha)
- [Beta](#beta)
- [GA](#ga)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
Expand Down Expand Up @@ -106,6 +173,25 @@ Items marked with (R) are required *prior to targeting to a milestone / release*

## Summary

<!--
This section is incredibly important for producing high-quality, user-focused
documentation such as release notes or a development roadmap. It should be
possible to collect this information before implementation begins, in order to
avoid requiring implementors to split their attention between writing release
notes and implementing the feature itself. KEP editors and SIG Docs
should help to ensure that the tone and content of the `Summary` section is
useful for a wide audience.
A good summary is probably at least a paragraph in length.
Both in this section and below, follow the guidelines of the [documentation
style guide]. In particular, wrap lines to a reasonable length, to make it
easier for reviewers to cite specific portions, and to minimize diff churn on
updates.
[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
-->

The KEP seeks to provide a way to choose correct behavior with how Container Runtimes (Containerd and CRI-O) are applying `SupplementalGroups` to the first container processes. The KEP describes the work needed to be done in Kubernetes or connected projects to make sure customers have a clear migration path - including detection and safe upgrade - if any of their workflows took a dependency on this arguably erroneous behavior.

### The issue
Expand Down Expand Up @@ -161,6 +247,15 @@ uid=1000(alice) gid=1000(alice) groups=1000(alice),50000(group-in-image),60000

## Motivation

<!--
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to keep these comments in the file - it just makes it appear longer :)

This section is for explicitly listing the motivation, goals, and non-goals of
this KEP. Describe why the change is important and the benefits to users. The
motivation section can optionally provide links to [experience reports] to
demonstrate the interest in a KEP within the wider Kubernetes community.
[experience reports]: https://github.com/golang/go/wiki/ExperienceReports
-->

As described above, how supplemental groups attached to the first container process is complicated and not OCI image spec compliant.

Moreover, this causes security considerations as follows. When a cluster enforces some security policy for pods that protects the value of `RunAsGroup` and `SupplementalGroups`, the effect of its enforcement is limited, i.e., cluster users can easily bypass the policy enforcement just by using a custom image. If such a bypass happened, it would be unexpected behavior for most cluster administrators because the enforcement is almost useless. Moreover, the bypass will cause unexpected file access permission. In some use cases, the unexpected file access permission will be a security concern. For example, using `hostPath` volumes could be a severe problem because UID/GIDs matter in accessing files/directories in the volumes.
Expand Down Expand Up @@ -254,7 +349,7 @@ message ContainerUser {
}
```

### User Stories
### User Stories (Optional)

<!--
Detail the things that people will be able to do if this KEP is implemented.
Expand All @@ -263,6 +358,7 @@ the system. The goal here is to make this feel real for users without getting
bogged down.
-->


#### Story 1: Deploy a Security Policy to enforce `SupplementalGroupsPolicy` field

Assume a multi-tenant kubernetes cluster with `hostPath` volumes below situations:
Expand Down Expand Up @@ -294,6 +390,8 @@ As described in [Summary](#summary) section, `alice` can bypass the restriction

Please note that a security policy without `supplementalGroupsPolicy` would lead to unexpected groups for the first process in the containers.

<!-- #### Story 2 -->

### Notes/Constraints/Caveats (Optional)

<!--
Expand Down Expand Up @@ -325,6 +423,13 @@ Consider including folks who also work outside the SIG or subproject.

## Design Details

<!--
This section should contain enough information that the specifics of your
change are understandable. This may include API specs (though not always
required) or even code snippets. If there's any ambiguity about HOW your
proposal will be implemented, this is the place to discuss them.
-->

### Kubernetes API

#### SupplementalGroupsPolicy in PodSecurityContext
Expand Down Expand Up @@ -473,7 +578,7 @@ when drafting this test plan.
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
-->

[ ] I/we understand the owners of the involved components may require updates to
[x] I/we understand the owners of the involved components may require updates to
existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

Expand Down Expand Up @@ -505,7 +610,8 @@ This can inform certain test coverage improvements that we want to do before
extending the production code to implement this enhancement.
-->

- `<package>`: `<date>` - `<test coverage>`
- `k8s.io/kubernetes/pkg/apis/core/validation`: `<date>(t.b.d.)` - `<test coverage>(t.b.d.)`
- validation tests for `PodSecurityContext.SupplementalGroups`, `ContainerStatus.User`

##### Integration tests

Expand All @@ -517,7 +623,12 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
https://storage.googleapis.com/k8s-triage/index.html
-->

- <test>: <link to test coverage>
- Kubernetes API
- When `SupplementalGroupsPolicy=Strict`, groups of the container process must be ones specified by API: <link to test coverage(t.b.d.)>
- When `SupplementalGroupsPolicy=Merge`, groups of the container process contains both groups specified by API and groups of the primary user from the image: <link to test coverage(t.b.d.)>
- For running pods, `ContainerStatus.User` contains the correct identities of the containers: <link to test coverage(t.b.d.)>
- CRI
- I will also add symmetrical integration tests to https://github.com/kubernetes-sigs/cri-tools

##### e2e tests

Expand All @@ -531,7 +642,9 @@ https://storage.googleapis.com/k8s-triage/index.html
We expect no non-infra related flakes in the last month as a GA graduation criteria.
-->

- <test>: <link to test coverage>
- When creating a Pod with `SupplementalGroupsPolicy=Strict`, the pods will run with only groups specified by API: <link to test coverage(t.b.d.)>
- When creating a Pod with `SupplementalGroupsPolicy=Merge`, the pods will run with groups specified by API and groups from the image: <link to test coverage(t.b.d.)>
- When creating a Pod and it starts, each `ContainerStatus.User` contain the correct identities of the containers: <link to test coverage(t.b.d.)>

### Graduation Criteria

Expand Down Expand Up @@ -561,42 +674,30 @@ functionality is accessed.
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].
-->

Because this KEP's core implementation(i.e. `SupplementalGroupsPolicy` handling) lies inside of CRI implementations(e.g. containerd, cri-o), the graduation criteria contains the support statuses of the updated CRI by container runtimes.

#### Alpha

- Feature implemented behind a feature flag
- Initial e2e tests completed and enabled
- At least one of the most popular Container Runtimes(e.g. containerd) implements the updated CRI and released
- Feature implemented behind a feature flag based on the Container Runtime
- Unit tests and initial e2e tests completed and enabled

#### Beta

- Gather feedback from developers and surveys
- Complete features A, B, C
- Additional tests are in Testgrid and linked in KEP
- Several popular Container Runtimes(e.g. containerd and cri-o) support the updated CRI and released
- Fixed reported bugs from the community
- Additional integration tests and e2e tests are in Testgrid and linked in KEP

#### GA

- N examples of real-world usage
- N installs
- More rigorous forms of testing—e.g., downgrade tests and scalability tests
- Allowing time for feedback
**Note:** Generally we also wait at least two releases between beta and
GA/stable, because there's no opportunity for user feedback, or even bug reports,
in back-to-back releases.
**For non-optional features moving to GA, the graduation criteria must include
[conformance tests].**
- At least one of Container Runtimes which is not based on the classic container, gVisor for example, supports the updated CRI and released
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fvoznika FYI on this development

- Assuming no negative user feedback based on production experience, promote after 2 releases in beta.
- [conformance tests] are added for `SupplementalGroupsPolicy` and `ContainerStatus.User` APIs

[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md

#### Deprecation
- Announce deprecation and support policy of the existing flag
- Two versions passed since introducing the functionality that deprecates the flag (to address version skew)
- Address feedback on usage/changed behavior, provided on GitHub issues
- Deprecate the flag
-->

### Upgrade / Downgrade Strategy

<!--
Expand Down Expand Up @@ -626,7 +727,7 @@ enhancement:
CRI or CNI may require updating that component before the kubelet.
-->

- CRI must support this feature, especially when using `SupplementalGroupsPolicy=IgnoreGroupsInImage`.
- CRI must support this feature, especially when using `SupplementalGroupsPolicy=Strict`.
- kubelet must be at least the version of control-plane components.

## Production Readiness Review Questionnaire
Expand Down Expand Up @@ -687,6 +788,7 @@ well as the [existing list] of feature gates.
Any change of default behavior may be surprising to users or break existing
automations, so be extremely careful here.
-->

No. Just introducing new API fields in Pod spec and CRI which does NOT change the default behavior.

###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Expand All @@ -702,11 +804,11 @@ feature.
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
-->

Yes. It can be disabled after enabled. However, users should pay attention that gids of container processes in pods with `IgnoreGroupsInImage` policy would change. It means the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
Yes. It can be disabled after enabled. However, users should pay attention that gids of container processes in pods with `Strict` policy would change. It means the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.

###### What happens if we reenable the feature if it was previously rolled back?

Just the policy `IgnoreGroupsInImage` is reenabled. Users should pay attention that gids of containers in pods with `IgnoreGroupsInImage` policy would change. It means that the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
Just the policy `Stcict` is reenabled. Users should pay attention that gids of containers in pods with `Stcict` policy would change. It means that the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.

###### Are there any tests for feature enablement/disablement?

Expand Down Expand Up @@ -919,7 +1021,7 @@ Describe them, providing:
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
-->

No.
Precisely, yes because the kep introduces new API fields in Pods. But the increasing size can be negligible.

###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Expand Down Expand Up @@ -948,6 +1050,18 @@ This through this both in small and large cases, again with respect to the

No.

###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

<!--
Focus not just on happy cases, but primarily on more pathological cases
(e.g. probes taking a minute instead of milliseconds, failed pods consuming resources, etc.).
If any of the resources can be exhausted, how this is mitigated with the existing limits
(e.g. pods per node) or new limits added by this KEP?
Are there any tests that were run/should be run to understand performance characteristics better
and validate the declared limits?
-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer here should be "No" I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

should be No

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-->
-->
No

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to merge without this answer to hit the KEP deadline


### Troubleshooting

<!--
Expand Down Expand Up @@ -999,6 +1113,8 @@ Major milestones might include:
Why should this KEP _not_ be implemented?
-->

N/A

## Alternatives

<!--
Expand Down Expand Up @@ -1027,4 +1143,4 @@ new subproject, repos requested, or GitHub details. Listing these here allows a
SIG to get the process for these resources started right away.
-->

N/A
N/A