-
Notifications
You must be signed in to change notification settings - Fork 25
dcm taint toleration from GPU Operator to KMM Operator #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* fix: Only remove node labeller managed labels The reconciler currently removes all labels with the amd.com and beta.amd.com prefix on nodes during cleanup. This is overly aggressive and can delete labels added by other users or systems. This commit corrects the behavior to only remove labels that are specifically managed by node labeller, ensuring that only relevant labels are automatically cleaned up. * test: Add more unit test cases for label cleanup modification
Argo turns pre-uggrade hooks into pre-sync hooks, which means you cannot even install as said hook relies on CRDs, service accounts etc, which aren't installed until after he hook executes. Make the pre-upgrade hook more tolerant by not doing anything if the CRD isn't installed.
…cheduling Amend some details for nodeAffinity preferredDuringSchedulingIgnoredDuringExecution
…) (#410) (#411) (cherry picked from commit b2154e2a62687de7faea8f037c0896eb18cd5e7b) Co-authored-by: Titus Ou <[email protected]>
(cherry picked from commit aab02bfbd0bb585e5fdc770df6058467e859cd38) (cherry picked from commit 740d293eab7a307c88383c65e02358713e5de025) Co-authored-by: Titus Ou <[email protected]>
(cherry picked from commit 5f6edff5dd55afcf61e007473af6ec82b82ba741) Co-authored-by: Titus Ou <[email protected]>
(cherry picked from commit 8f67ad10b875f2d2b73880b77c0642383398f8aa) (cherry picked from commit a959f23a3cfa5319f07b34bb61e19ca934f9935f) Co-authored-by: Titus Ou <[email protected]>
* Prometheus Integration support in GPU Operator (#594) * Prometheus Integration: CRD Additions and Vendoring - Adds CRD fields to support ServiceMonitorConfig. This change does not include TLS, Auth support in the CRD. - Vendor prometheus-operator monitoring APIs * Add validation for new CRD fields in ServiceMoitorConfig - kubebuilder validation for interval - verify ServiceMonitor CRD in the cluster if enabled in DeviceConfig * Deploy ServiceMonitor objects in Operator * Bump controller-gen version to 0.17.0 * Add ServiceMonitor, APIExtension CRUD permissions to Operator SA - The ServiceAccount attached to the Operator needs elevated permissions to perform CRUDs on K8s APIExtension, CoreOS Monitoring groups to read installed CRDs and install/delete ServiceMonitor objects. * Refactor code, address review comments * Add TLS/Auth sections to Kube rbac proxy and ServiceMonitorConfig --------- Co-authored-by: Nitish Bhat <[email protected]> * Handle ServiceMonitor CRD not found error (#609) - When ServiceMonitor CRD (monitoringv1) is not found, the error returned is a NoMatchError. There's nothing to delete when we see this error, so we have to handle it gracefully. Co-authored-by: Nitish Bhat <[email protected]> * Add monitoringv1.ServiceMonitor Patch RBAC Permission to GPU Operator --------- Co-authored-by: Nitish Bhat <[email protected]>
…04) (#607) (cherry picked from commit db1ee89cc0c8911cff473536953c9615d42629f6) Co-authored-by: Nitish Bhat <[email protected]> Co-authored-by: Nitish Bhat <[email protected]>
Co-authored-by: Nitish Bhat <[email protected]>
…ict (#632) (#633) Co-authored-by: Nitish Bhat <[email protected]> (cherry picked from commit 5a4fe675365227a818b8e2deb54fb8db3f93407d) Co-authored-by: Nitish Bhat <[email protected]>
* Add default DeviceConfig CR for Helm Chart * Fix helm chart pre-upgrade hook to support Argo deployment (#611) * Optimize default CR's default values * Add e2e test for helm chart * Address comment
Not sure why, this PR has lot of other commits pulled in by someone. Happened once before as well. Closing it. Will re-open |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR has the proto changes which will support the latest KMM image once it is released publicly in v1.3.1.
So this PR should only get merged when v1.3.1 GPU Operator along with our v1.3.1 KMM is out