Skip to content

Commit 639147f

Browse files
Edwinhr716ahg-g
andauthored
Add KEP #257: Leader Only SubGroup (#402)
* adding KEP to support leaderOnly subgroup * changed implementation to leader not actually being on its own subgroup * fix toc * Update keps/257-Subgroup-leader-only/README.md Co-authored-by: Abdullah Gharaibeh <[email protected]> * fixed indentation, changed to be accelerator agnostic * switched to leaderExcluded * reworded to leader excluded --------- Co-authored-by: Abdullah Gharaibeh <[email protected]>
1 parent ab40b62 commit 639147f

File tree

2 files changed

+372
-0
lines changed

2 files changed

+372
-0
lines changed
+355
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,355 @@
1+
# KEP-257: SubGroup Leader Only
2+
3+
<!--
4+
This is the title of your KEP. Keep it short, simple, and descriptive. A good
5+
title can help communicate what the KEP is and should be considered as part of
6+
any review.
7+
-->
8+
9+
<!--
10+
A table of contents is helpful for quickly jumping to sections of a KEP and for
11+
highlighting any additional information provided beyond the standard KEP
12+
template.
13+
14+
Ensure the TOC is wrapped with
15+
<code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
16+
tags, and then generate with `hack/update-toc.sh`.
17+
-->
18+
19+
<!-- toc -->
20+
- [Summary](#summary)
21+
- [Motivation](#motivation)
22+
- [Goals](#goals)
23+
- [Non-Goals](#non-goals)
24+
- [Proposal](#proposal)
25+
- [User Stories (Optional)](#user-stories-optional)
26+
- [Story 1](#story-1)
27+
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
28+
- [Risks and Mitigations](#risks-and-mitigations)
29+
- [Design Details](#design-details)
30+
- [Subgroup Creation](#subgroup-creation)
31+
- [TPU Environment Injection](#tpu-environment-injection)
32+
- [Test Plan](#test-plan)
33+
- [Prerequisite testing updates](#prerequisite-testing-updates)
34+
- [Unit tests](#unit-tests)
35+
- [Integration tests](#integration-tests)
36+
- [e2e tests](#e2e-tests)
37+
- [Graduation Criteria](#graduation-criteria)
38+
- [Implementation History](#implementation-history)
39+
- [Drawbacks](#drawbacks)
40+
- [Alternatives](#alternatives)
41+
<!-- /toc -->
42+
43+
## Summary
44+
45+
<!--
46+
This section is incredibly important for producing high-quality, user-focused
47+
documentation such as release notes or a development roadmap. It should be
48+
possible to collect this information before implementation begins, in order to
49+
avoid requiring implementors to split their attention between writing release
50+
notes and implementing the feature itself. KEP editors and SIG Docs
51+
should help to ensure that the tone and content of the `Summary` section is
52+
useful for a wide audience.
53+
54+
A good summary is probably at least a paragraph in length.
55+
56+
Both in this section and below, follow the guidelines of the [documentation
57+
style guide]. In particular, wrap lines to a reasonable length, to make it
58+
easier for reviewers to cite specific portions, and to minimize diff churn on
59+
updates.
60+
61+
[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
62+
-->
63+
This KEP aims to extend the SubGroup feature by adding an API field that defines the type of Subgroup that will be created. The existing version
64+
will become "LeaderWorker", and a new type which will exclude the leader from any subgroup.
65+
66+
67+
## Motivation
68+
69+
<!--
70+
This section is for explicitly listing the motivation, goals, and non-goals of
71+
this KEP. Describe why the change is important and the benefits to users. The
72+
motivation section can optionally provide links to [experience reports] to
73+
demonstrate the interest in a KEP within the wider Kubernetes community.
74+
75+
[experience reports]: https://github.com/golang/go/wiki/ExperienceReports
76+
-->
77+
78+
There are cases where the resource requirements for the leader are different from the ones by the worker, while still being able to guarantee exclusive placement
79+
on the workers. Having a `LeaderExcluded` subgroup means that the placement of the leader will be independent of the workers, and by adding the `subgroup-exclusive-topology`, the
80+
workers can be guaranteed to be scheduled on the same topology.
81+
82+
### Goals
83+
84+
<!--
85+
List the specific goals of the KEP. What is it trying to achieve? How will we
86+
know that this has succeeded?
87+
-->
88+
89+
Add an option exclude the leader from any subgroup, while still being able to create subgroups on the other workers. Therefore, it should be possible to do the following split:
90+
leader, (worker-1, worker-2), (worker-3, worker-4). If the desired effect is to have just one subgroup for the workers, subgroupSize should be set
91+
to the number of workers (so size - 1).
92+
93+
### Non-Goals
94+
This KEP assumes that the leader will not request accelerator resources, and thus, subGroupSize will always be assumed to be an odd number.
95+
96+
<!--
97+
What is out of scope for this KEP? Listing non-goals helps to focus discussion
98+
and make progress.
99+
-->
100+
101+
## Proposal
102+
103+
<!--
104+
This is where we get down to the specifics of what the proposal actually is.
105+
This should have enough detail that reviewers can understand exactly what
106+
you're proposing, but should not include things like API designs or
107+
implementation. What is the desired outcome and how do we measure success?.
108+
The "Design Details" section below is for the real
109+
nitty-gritty.
110+
-->
111+
112+
### User Stories (Optional)
113+
114+
<!--
115+
Detail the things that people will be able to do if this KEP is implemented.
116+
Include as much detail as possible so that people can understand the "how" of
117+
the system. The goal here is to make this feel real for users without getting
118+
bogged down.
119+
-->
120+
121+
#### Story 1
122+
As a user, I should be able to create a subGroup that only includes the leader in order to be able to schedule the leader with different resource requirements
123+
still keeping exclusive topology on the workers.
124+
125+
### Notes/Constraints/Caveats (Optional)
126+
As mentioned in Non-Goals, this KEP assumes that the leader does not request accelerator resources.
127+
128+
<!--
129+
What are the caveats to the proposal?
130+
What are some important details that didn't come across above?
131+
Go in to as much detail as necessary here.
132+
This might be a good place to talk about core concepts and how they relate.
133+
-->
134+
135+
### Risks and Mitigations
136+
137+
<!--
138+
What are the risks of this proposal, and how do we mitigate? Think broadly.
139+
For example, consider both security and how this will impact the larger
140+
Kubernetes ecosystem.
141+
142+
How will security be reviewed, and by whom?
143+
144+
How will UX be reviewed, and by whom?
145+
146+
Consider including folks who also work outside the SIG or subproject.
147+
-->
148+
149+
## Design Details
150+
151+
<!--
152+
This section should contain enough information that the specifics of your
153+
change are understandable. This may include API specs (though not always
154+
required) or even code snippets. If there's any ambiguity about HOW your
155+
proposal will be implemented, this is the place to discuss them.
156+
-->
157+
158+
A new API field will be added under the SubGroupPolicy type. This API field will determine what type
159+
of subgroup is desired. The default is LeaderWorker, which is the existing implementation of subgroup. `SubGroupSize`
160+
will continue to be a required field, while `Type` will be optional.
161+
162+
```golang
163+
type SubGroupPolicy struct {
164+
165+
// +kubebuilder:validation:Enum={LeaderWorker,LeaderExcluded}
166+
// +kubebuilder:default=LeaderWorker
167+
Type SubGroupPolicyType `json:"subGroupPolicyType,omitempty"`
168+
169+
SubGroupSize *int32 `json:"subGroupSize,omitempty"`
170+
}
171+
172+
type SubGroupPolicyType string
173+
174+
const (
175+
SubGroupPolicyLeaderWorker SubGroupPolicyType = "LeaderWorker"
176+
177+
SubGroupPolicyLeaderExcluded SubGroupPolicyType = "LeaderExcluded"
178+
)
179+
```
180+
181+
A new annotation will be created to determine whether or not the subgroup type is `LeaderExcluded`
182+
183+
* `leaderworkerset.sigs.k8s.io/subgroup-policy-type`
184+
185+
This annotation will only be injected in the leader pod.
186+
187+
In order to keep backwards compatability, it will only be added if the type is `LeaderExcluded`.
188+
189+
### Subgroup Creation
190+
Implementation wise, the only change needed is to not add the SubGroup labels on the leader if the SubGroupType is `LeaderExcluded`. Effectively, this means
191+
that the leader is not part of a subgroup at all. The only point of a pod being in a subgroup is to guarantee exclusive placement with the other pods
192+
in the subgroup. However, since this is a one pod subgroup, there is no use case for injecting the subgroup labels on the leader.
193+
194+
195+
```golang
196+
func (p *PodWebhook) Default() {
197+
if foundSubGroupSize && pod.Labels[leaderworkerset.SubGroupIndexLabelKey] == "" && !leaderOnlySubGroup {
198+
pod.Labels[leaderworkerset.SubGroupIndexLabelKey] = "0"
199+
subGroupUniqueKey := genGroupUniqueKey(pod.Name, "0")
200+
pod.Labels[leaderworkerset.SubGroupUniqueHashLabelKey] = subGroupUniqueKey
201+
if subEpKey, foundSubEpKey := pod.Annotations[leaderworkerset.SubGroupExclusiveKeyAnnotationKey]; foundSubEpKey {
202+
SetExclusiveAffinities(pod, subGroupUniqueKey, subEpKey, leaderworkerset.SubGroupUniqueHashLabelKey)
203+
}
204+
}
205+
}
206+
```
207+
208+
### TPU Environment Injection
209+
When injecting TPU environment variables, we will keep the same logic that exists for when the leader doesn't request TPU resources.
210+
211+
### Test Plan
212+
213+
<!--
214+
**Note:** *Not required until targeted at a release.*
215+
The goal is to ensure that we don't accept enhancements with inadequate testing.
216+
217+
All code is expected to have adequate tests (eventually with coverage
218+
expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
219+
when drafting this test plan.
220+
221+
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
222+
-->
223+
224+
[X] I/we understand the owners of the involved components may require updates to
225+
existing tests to make this code solid enough prior to committing the changes necessary
226+
to implement this enhancement.
227+
228+
##### Prerequisite testing updates
229+
230+
<!--
231+
Based on reviewers feedback describe what additional tests need to be added prior
232+
implementing this enhancement to ensure the enhancements have also solid foundations.
233+
-->
234+
235+
##### Unit tests
236+
237+
238+
<!--
239+
In principle every added code should have complete unit test coverage, so providing
240+
the exact set of tests will not bring additional value.
241+
However, if complete unit test coverage is not possible, explain the reason of it
242+
together with explanation why this is acceptable.
243+
-->
244+
245+
<!--
246+
Additionally, for Alpha try to enumerate the core package you will be touching
247+
to implement this enhancement and provide the current unit coverage for those
248+
in the form of:
249+
- <package>: <date> - <current test coverage>
250+
The data can be easily read from:
251+
https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
252+
253+
This can inform certain test coverage improvements that we want to do before
254+
extending the production code to implement this enhancement.
255+
-->
256+
257+
##### Integration tests
258+
259+
<!--
260+
Integration tests are contained in k8s.io/kubernetes/test/integration.
261+
Integration tests allow control of the configuration parameters used to start the binaries under test.
262+
This is different from e2e tests which do not allow configuration of parameters.
263+
Doing this allows testing non-default options and multiple different and potentially conflicting command line options.
264+
-->
265+
266+
<!--
267+
This question should be filled when targeting a release.
268+
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
269+
270+
For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
271+
https://storage.googleapis.com/k8s-triage/index.html
272+
-->
273+
274+
- <test>: <link to test coverage>
275+
276+
##### e2e tests
277+
278+
<!--
279+
This question should be filled when targeting a release.
280+
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
281+
282+
For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
283+
https://storage.googleapis.com/k8s-triage/index.html
284+
285+
We expect no non-infra related flakes in the last month as a GA graduation criteria.
286+
-->
287+
288+
- <test>: <link to test coverage>
289+
290+
### Graduation Criteria
291+
292+
<!--
293+
294+
Clearly define what it means for the feature to be implemented and
295+
considered stable.
296+
297+
If the feature you are introducing has high complexity, consider adding graduation
298+
milestones with these graduation criteria:
299+
- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
300+
- [Feature gate][feature gate] lifecycle
301+
- [Deprecation policy][deprecation-policy]
302+
303+
[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
304+
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
305+
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
306+
-->
307+
308+
## Implementation History
309+
310+
<!--
311+
Major milestones in the lifecycle of a KEP should be tracked in this section.
312+
Major milestones might include:
313+
- the `Summary` and `Motivation` sections being merged, signaling SIG acceptance
314+
- the `Proposal` section being merged, signaling agreement on a proposed design
315+
- the date implementation started
316+
- the first Kubernetes release where an initial version of the KEP was available
317+
- the version of Kubernetes where the KEP graduated to general availability
318+
- when the KEP was retired or superseded
319+
-->
320+
321+
## Drawbacks
322+
323+
<!--
324+
Why should this KEP _not_ be implemented?
325+
-->
326+
327+
## Alternatives
328+
Original implementation was to have the leader have the subGroupIndex be 0, and shift all other subGroup indices are moved one to the right.
329+
330+
```golang
331+
if leaderOnly {
332+
return fmt.Sprint(((workerIndex - 1) / subGroupSize) + 1)
333+
}
334+
```
335+
336+
This implementation also requires to shift the subGroupIndex back to the left when injecting TPU environment variables
337+
338+
```golang
339+
// Take worker 1 as an example with subGroupSize set to 2. It will now have subGroupIndex set to 1, which means that the start will be 3, and end will be 4.
340+
// Need to shift it down by 1 for the calculation to still work out.
341+
if leaderOnlyType && subGroupIndex != 0 {
342+
subGroupIndex -= 1
343+
}
344+
start := subGroupSize*subGroupIndex + 1
345+
end := subGroupSize * (subGroupIndex + 1)
346+
```
347+
348+
Since there is no use case for having the leaderPod on its own subgroup, this implementation was scratched as it adds unnecessary complexity
349+
350+
351+
<!--
352+
What other approaches did you consider, and why did you rule them out? These do
353+
not need to be as detailed as the proposal, but should include enough
354+
information to express the idea and why it was not acceptable.
355+
-->
+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
title: Controller Revision
2+
kep-number: 238
3+
authors:
4+
- edwinhr716
5+
status: implementable
6+
creation-date: 2025-02-27
7+
reviewers:
8+
- kerthcet
9+
- ahg-g
10+
approvers:
11+
- kerthcet
12+
- ahg-g
13+
14+
# The most recent milestone for which work toward delivery of this KEP has been
15+
# done. This can be the current (upcoming) milestone, if it is being actively
16+
# worked on.
17+
latest-milestone: "v0.6.0"

0 commit comments

Comments
 (0)