Add KEP #257: Leader Only SubGroup #402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

k8s-ci-robot merged 7 commits into kubernetes-sigs:main from Edwinhr716:kep-257

Mar 13, 2025

Contributor

Edwinhr716 commented Feb 27, 2025

What type of PR is this?

/kind documentation

What this PR does / why we need it

Which issue(s) this PR fixes

Part of #257

Special notes for your reviewer

Does this PR introduce a user-facing change?


          adding KEP to support leaderOnly subgroup

fa0670e

k8s-ci-robot added the kind/documentation label

Contributor

k8s-ci-robot commented Feb 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Edwinhr716

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Edwinhr716]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from ahg-g and kerthcet

February 27, 2025 20:11

k8s-ci-robot added cncf-cla: yes approved size/L labels

Contributor

kerthcet commented Mar 3, 2025

I may review this tomorrow because of limited bandwidth, sorry for that.
/assign

k8s-ci-robot assigned kerthcet

Contributor

kerthcet commented Mar 6, 2025

reviewing now.

kerthcet reviewed

View reviewed changes

Contributor

kerthcet left a comment

several comments.

keps/257-Subgroup-leader-only/README.md Outdated

+              const (
+              	SubGroupPolicyLeaderWorker SubGroupPolicyType = "LeaderWorker"
+              	SubGroupPolicyLeaderOnly SubGroupPolicyType = "LeaderOnly"

Contributor

kerthcet Mar 6, 2025

I think LeaderOnly is a bit hard to understand, what about LeaderAlone vs All.

Contributor Author

Edwinhr716 Mar 6, 2025

I'm fine with LeaderAlone, but All sounds a bit ambiguous to me. LeaderAlone vs LeaderWorker maybe? Or something that specifies that the leader and a worker pod will be part of the same subgroup

keps/257-Subgroup-leader-only/README.md Outdated

+              to the number of workers (so size - 1).
+              ### Non-Goals
+              This KEP assumes that the leader will not request TPU resources, and thus, subGroupSize will always be assumed to be an odd number. Moreover, no TPU environment variables will be

Contributor

kerthcet Mar 6, 2025

Sorry, is this only work for TPU? I mean the exclusive annotation, I forgot it.

Contributor Author

Edwinhr716 Mar 6, 2025

Both TPU and GPU are supported for exclusive placement. I'll rephrase to be accelerator agnostic

keps/257-Subgroup-leader-only/README.md Outdated


		* `leaderworkerset.sigs.k8s.io/subgroup-policy-type`

		In order to keep backwards compatability, it will only be added if the type is `LeaderOnly`.

Contributor

kerthcet Mar 6, 2025

I think even we add the annotation for both types is still backward compatibility. No behavior change.

Contributor Author

Edwinhr716 Mar 6, 2025 •

edited

Loading

The scenario that I see is that there is an existing deployment using SubGroupPolicy which then upgrades the LWS controller. After the upgrade, it will inject the annotation on the PodSpec, which will then trigger an update at the leader Sts level since the Pod has changed.

Contributor

kerthcet Mar 10, 2025

Ah, yes, this is disgusting, usually we only reconcile when spec changes.

keps/257-Subgroup-leader-only/README.md Outdated

+              }
+              ```
+              Because the leader now occupies a subgroup, and `SubGroupSize` will not always be one, the way we calculate the SubGroupIndex needs to be modified. When generating

Contributor

kerthcet Mar 6, 2025

One argue here is should we mark the leader a sub group? What's the Cons if we don't do this.

Contributor Author

Edwinhr716 Mar 6, 2025

That's a good point, hadn't considered not making the leader part of a subgroup at all. Doing it this way simplifies the rest of the changes (though the TPU environment injection will still need some adjustment). Will test it out and add to the Alternatives which ever version we don't implement


          changed implementation to leader not actually being on its own subgroup

1bf86ea

Contributor Author

Edwinhr716 commented Mar 7, 2025

Changed implementation to the leader not actually being part of any subgroup since it is simpler, added original implementation to Alternatives. Also changed LeaderOnly to LeaderAlone. Please take another look @kerthcet


          fix toc

cc18355

ahg-g reviewed

View reviewed changes

keps/257-Subgroup-leader-only/README.md Outdated Show resolved Hide resolved

ahg-g reviewed

View reviewed changes

keps/257-Subgroup-leader-only/README.md Outdated Show resolved Hide resolved

ahg-g reviewed

View reviewed changes

keps/257-Subgroup-leader-only/README.md Outdated Show resolved Hide resolved

ahg-g reviewed

View reviewed changes

keps/257-Subgroup-leader-only/README.md Outdated

+              In order to keep backwards compatability, it will only be added if the type is `LeaderAlone`.
+              ### Subgroup Creation
+              Implementation wise, the only change needed is to not add the SubGroup labels on the leader if the SubGroupType is LeaderAlone. Effectively, this means

Contributor

ahg-g Mar 8, 2025

The indexing within the group will be different now. Currently we don't define an index within the group, we do it only for tpuWorkerId, and this should be calculated differently now to exclude the leader: https://github.com/kubernetes-sigs/lws/blob/main/pkg/utils/accelerators/tpu.go#L113, right?

Contributor Author

Edwinhr716 Mar 10, 2025

We already do this when the leader doesn't request TPU resources https://github.com/kubernetes-sigs/lws/blob/main/pkg/utils/accelerators/tpu.go#L115-L117

kerthcet reviewed

View reviewed changes

Contributor

kerthcet left a comment

I think we can leave more implementation details to the PR.

keps/257-Subgroup-leader-only/README.md Show resolved Hide resolved

Edwinhr716 and others added 3 commits

March 10, 2025 09:15


          Update keps/257-Subgroup-leader-only/README.md

d47f518

Co-authored-by: Abdullah Gharaibeh <[email protected]>


          fixed indentation, changed to be accelerator agnostic

0cef2ed


          switched to leaderExcluded

b9c7aa3

kerthcet reviewed

View reviewed changes

Contributor

kerthcet left a comment

I think the calculation of worker subGroup index still needs to change, for example, we set the subgroupSize=2,

before with group size = 6, it looks like (0, 0), (1, 0), (2, 1), (3, 1), (4, 2), (5 ,2)
after with group size = 7, it looks like (0, x), (1, 0), (2, 0), (3, 1), (4, 1), (5, 2), (6, 2)

(x, y) represents the (workerIndex, subGroupIndex), for the workerIndex=2, before the subgroupIndex is 1, after is 0. Not familiar with the TPU implementation, maybe the same? But this is just code implementation, we can leave the detailed review to the PR, just mention this fact in the proposal briefly.

keps/257-Subgroup-leader-only/README.md Outdated Show resolved Hide resolved

keps/257-Subgroup-leader-only/README.md Outdated Show resolved Hide resolved

keps/257-Subgroup-leader-only/README.md Outdated Show resolved Hide resolved

keps/257-Subgroup-leader-only/README.md Show resolved Hide resolved

Contributor Author

Edwinhr716 commented Mar 11, 2025

I think the calculation of worker subGroup index still needs to change, for example, we set the subgroupSize=2,

before with group size = 6, it looks like (0, 0), (1, 0), (2, 1), (3, 1), (4, 2), (5 ,2)

after with group size = 7, it looks like (0, x), (1, 0), (2, 0), (3, 1), (4, 1), (5, 2), (6, 2)

(x, y) represents the (workerIndex, subGroupIndex), for the workerIndex=2, before the subgroupIndex is 1, after is 0. Not familiar with the TPU implementation, maybe the same? But this is just code implementation, we can leave the detailed review to the PR, just mention this fact in the proposal briefly.

Is this not what we want? With this calculation, we ensure that there are two pods per subgroup, which should be the case if subGroupSize=2. If we want the subGroupIndex to match, then the first subgroup would only have 1 pod, same with the last subgroup.


          reworded to leader excluded

aebce78

Contributor

kerthcet commented Mar 12, 2025

Yes, the result is as we expected, I just mean the method may need to change:

lws/pkg/webhooks/pod_webhook.go

Lines 245 to 251 in ab40b62

    
           func getSubGroupIndex(podCount int, subGroupSize int, workerIndex int) string { 
        
           	if (podCount-1)%subGroupSize == 0 { 
        
           		// Leader is considered as extra pod, it is part of the first group 
        
           		return fmt.Sprint((workerIndex - 1) / subGroupSize) 
        
           	} 
        
           	return fmt.Sprint(workerIndex / subGroupSize) 
        
           }

For workIndex=2, before it's (2, 1), now it's (2, 0), the subGroupIndex changes. Do I miss anything? I think we'll call the method, right?

Contributor Author

Edwinhr716 commented Mar 13, 2025 •

edited

Loading

Yeah we call that method. I guess I'm a little bit confused on what scenario you are referring to. The current behavior when using this feature has workerIndex=2 at subgroup 0. If using the default configuration, but still having an odd number for size, it will still be workerIndex=2 at subgroup 0. The only case where workerIndex=2 gives subgroup 1, is if using default behavior and size is even

Contributor

kerthcet commented Mar 13, 2025

If using the default configuration, but still having an odd number for size, it will still be workerIndex=2 at subgroup 0

Doesn't this should be workerIndex/subGroupSize = 2/2 = 1, rather than 0.

Let's move the discussion to PR then.
/lgtm
/label tide/merge-method-squash

k8s-ci-robot added the tide/merge-method-squash label

k8s-ci-robot added the lgtm label

k8s-ci-robot merged commit 639147f into kubernetes-sigs:main

8 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cncf-cla: yes kind/documentation lgtm size/L tide/merge-method-squash