-
Notifications
You must be signed in to change notification settings - Fork 373
koordlet: add psi qos reconciler #2463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2463 +/- ##
==========================================
- Coverage 65.93% 65.51% -0.42%
==========================================
Files 477 483 +6
Lines 56194 56655 +461
==========================================
+ Hits 37049 37118 +69
- Misses 16461 16847 +386
- Partials 2684 2690 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6910235
to
006665f
Compare
1. PSIExport: Collects PSI metrics for Pods and reports them via Pod Conditions. 2. MemorySuppress: Applies pressure to Pod memory allocation, increasing with the growth of allocated memory. 3. GroupShare: Groups Pods and allows CPU weight sharing within a group. 4. BudgetBalance: Balances CPU usage among Pods over time, beneficial for burstable Pods. Signed-off-by: wheat2018 <[email protected]>
++ /cc @songtao98 for PSI collector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a delicate job, but it's also complex :) So please add a document describing its design and how to use it when you have time.
|
||
type PSIThreshold struct { | ||
// Avg10 indicates the average 10-second PSI threshold, range [0,10000] indicating [0%,100%]. | ||
Avg10 int64 `json:"avg10,omitempty" validate:"min=0,max=10000"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable the CRD validation, it needs some kubebuilder tags like:
// +kubebuilder:validation:Minimum=0
return int64(float64(new-old) / interval.Seconds()) | ||
} | ||
|
||
func max(a, b int64) int64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
math.MaxInt64?
func DefaultPSIStrategy() *slov1alpha1.PSIStrategy { | ||
return &slov1alpha1.PSIStrategy{ | ||
PSIExport: &slov1alpha1.PSIExportConfig{ | ||
Enable: pointer.Bool(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an alpha feature, please disable it by default.
} | ||
|
||
func (p *psiReconcile) Enabled() bool { | ||
return features.DefaultKoordletFeatureGate.Enabled(features.BlkIOReconcile) && p.reconcileInterval > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
features.DefaultKoordletFeatureGate.Enabled(features.BlkIOReconcile)
Please add a new feature gate.
return &CpuQuota{Quota: quota, Period: period}, nil | ||
} | ||
|
||
func WriteCpuMax(cgroupPath string, max *CpuQuota) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please reuse the resource executor and the cgroup resource:
- https://github.com/koordinator-sh/koordinator/blob/main/pkg/koordlet/resourceexecutor/reader.go#L29
- https://github.com/koordinator-sh/koordinator/blob/main/pkg/koordlet/resourceexecutor/updater.go
- https://github.com/koordinator-sh/koordinator/blob/main/pkg/koordlet/util/system/cgroup_resource.go#L207
Ⅰ. Describe what this PR does
add psi qos manager, which now supports 4 operators:
Ⅱ. Does this pull request fix one issue?
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
make test