Skip to content

Avoid auto cleanup node labeller labels on vgpu only node #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2025

Conversation

yansun1996
Copy link
Contributor

@yansun1996 yansun1996 commented May 8, 2025

Issue from public repo: #183

when users set spec.driver.enable=false to use inbox driver

controller is still looking for the select nodes to clean up the node labeller label, for the nodes that didn't have KMM ready labels

However, for the inbox driver use case KMM is not triggered, controller SHOULD NOT look for KMM ready labels.

Modifying

                sel := []string{
	                "! " + utils.NodeFeatureLabelAmdGpu,
	                "! " + labels.GetKernelModuleReadyNodeLabel(devConfig.Namespace, devConfig.Name),
	                "! " + labels.GetDevicePluginNodeLabel(devConfig.Namespace, devConfig.Name),
	                "! " + utils.NodeFeatureLabelAmdVGpu,
                }

to be

		sel := []string{
			"! " + utils.NodeFeatureLabelAmdGpu,
			"! " + utils.NodeFeatureLabelAmdVGpu,
		}

		if devConfig.Spec.Driver.Enable != nil && *devConfig.Spec.Driver.Enable {
			sel = append(sel,
				"! "+labels.GetKernelModuleReadyNodeLabel(devConfig.Namespace, devConfig.Name),
				"! "+labels.GetDevicePluginNodeLabel(devConfig.Namespace, devConfig.Name),
			)
		}

@yansun1996 yansun1996 changed the title Avoid auto cleanup node labeller labels on vgpu only node [WIP] Avoid auto cleanup node labeller labels on vgpu only node May 8, 2025
@yansun1996 yansun1996 force-pushed the fix_vgpu_node_label branch from ad6ba0e to 075feac Compare May 9, 2025 06:42
@yansun1996 yansun1996 changed the title [WIP] Avoid auto cleanup node labeller labels on vgpu only node Avoid auto cleanup node labeller labels on vgpu only node May 9, 2025
@sajmera-pensando sajmera-pensando merged commit 040a819 into ROCm:staging May 9, 2025
1 of 3 checks passed
@yansun1996 yansun1996 deleted the fix_vgpu_node_label branch May 9, 2025 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants