Skip to content

gc: consider whether the sts pod is alive during lsp gc #5122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2025

Conversation

zhangzujian
Copy link
Member

Pull Request

What type of this PR

Examples of user facing changes:

  • Features
  • Bug fixes
  • Docs
  • Tests

Which issue(s) this PR fixes

Fixes #(issue-number)

@zhangzujian zhangzujian added the bug Something isn't working label Apr 3, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @zhangzujian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request addresses a bug in the garbage collection (GC) logic for logical switch ports (LSPs) associated with StatefulSet pods. The changes ensure that the GC process correctly identifies and handles LSPs of StatefulSet pods, especially when the StatefulSet is being deleted, recreated, or scaled down. The key change is the introduction of a new function isStatefulSetPodToGC to determine if a StatefulSet pod's LSP should be garbage collected, taking into account the pod's liveness and the StatefulSet's state.

Highlights

  • GC Logic Enhancement: The pull request introduces isStatefulSetPodToGC function to accurately determine if a StatefulSet pod's LSP should be garbage collected.
  • StatefulSet Handling: The changes consider scenarios where the StatefulSet is being deleted, recreated, or scaled down, ensuring proper LSP garbage collection.
  • Pod Liveness Check: The pull request incorporates a check to determine if a pod is alive before garbage collecting its LSP, preventing premature deletion in scaled-up scenarios.

Changelog

Click here to see the changelog
  • pkg/controller/gc.go
    • Modified the markAndCleanLSP function to use isStatefulSetPodToGC instead of isStatefulSetPodToDel to determine if a StatefulSet pod's LSP should be garbage collected.
    • Line 334: Changed the function call from isStatefulSetPodToDel to isStatefulSetPodToGC.
  • pkg/controller/pod.go
    • Modified handleDeletePod function to improve logging and ensure correct handling of StatefulSet pod deletions.
    • Line 928: Changed the condition to check if pod.DeletionTimestamp is non-zero using !pod.DeletionTimestamp.IsZero().
    • Line 929: Updated log message to use podKey instead of podName.
    • Line 952: Updated log message to use podKey instead of pod.Name.
    • Line 963: Updated log message to use podKey instead of podName.
    • Line 985: Updated log message to include podKey for better identification.
    • Line 990: Updated log message to use podKey instead of pod.Name.
    • Modified isStatefulSetPodToDel function to improve logging and handle cases where the StatefulSet is being deleted or recreated.
    • Line 1276: Updated log message to include namespace.
    • Line 1279: Updated log message to include namespace.
    • Line 1285: Updated log message to include namespace.
    • Line 1289: Updated log message to include namespace.
    • Modified isStatefulSetPodToDel function to return true when the statefulset is downscaled.
    • Line 1303: Updated log message to include namespace.
    • Added a new function isStatefulSetPodToGC to determine if a StatefulSet pod's LSP should be garbage collected based on the StatefulSet's state and the pod's liveness.
    • Lines 1313-1368: Added the isStatefulSetPodToGC function with detailed logic for garbage collection decisions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A StatefulSet's pod,
Lives, dies, scales, a complex node,
GC must be wise.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request focuses on improving the garbage collection (GC) process for logical switch ports (LSPs) associated with StatefulSets in Kube-OVN. The changes aim to ensure that LSPs are only garbage collected when the corresponding StatefulSet pod is no longer considered 'alive' or required. The changes include renaming a function, updating log messages for clarity, and introducing a new function isStatefulSetPodToGC to determine if an LSP should be garbage collected. Overall, the changes seem reasonable and contribute to a more robust and accurate GC process.

Summary of Findings

  • Function Naming: The function isStatefulSetPodToDel is renamed to isStatefulSetPodToGC to better reflect its purpose. This change improves code clarity and maintainability.
  • Logging Improvements: Log messages are updated to include the namespace of the StatefulSet, providing more context for debugging and monitoring.
  • GC Logic Enhancement: The introduction of isStatefulSetPodToGC function enhances the GC logic by considering various scenarios such as StatefulSet deletion, recreation, and downscaling, ensuring that LSPs are only GCed when appropriate.

Merge Readiness

The pull request appears to be well-structured and addresses an important aspect of Kube-OVN's GC process. However, there are a few issues that should be addressed before merging. Specifically, the logging statements should be reviewed for consistency and clarity. Additionally, the logic within isStatefulSetPodToGC should be carefully examined to ensure it covers all relevant edge cases. I am unable to approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the high severity issues should be addressed before merging.

Comment on lines +1355 to +1363
klog.Infof("statefulset %s/%s is down scaled", pod.Namespace, statefulSetName)
if !isPodAlive(pod) {
// we must check whether the pod is alive because we have to consider the following case:
// 1. the statefulset is down scaled to zero
// 2. the lsp gc is triggered
// 3. gc interval, e.g. 90s, is passed and the second gc is triggered
// 4. the sts is up scaled to the original replicas
// 5. the pod is still running and it will not be recreated
return true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This section checks if the StatefulSet is downscaled and the pod is not alive. The comment explains a specific case where the pod might be running even after downscaling. Ensure that this logic is thoroughly tested and covers all possible scenarios. Consider adding a test case that specifically covers the scenario described in the comment.

		if index >= startOrdinal+int64(*sts.Spec.Replicas) {
			klog.Infof("statefulset %s/%s is down scaled", pod.Namespace, statefulSetName)
			if !isPodAlive(pod) {
				// we must check whether the pod is alive because we have to consider the following case:
				// 1. the statefulset is down scaled to zero
				// 2. the lsp gc is triggered
				// 3. gc interval, e.g. 90s, is passed and the second gc is triggered
				// 4. the sts is up scaled to the original replicas
				// 5. the pod is still running and it will not be recreated
				return true

Comment on lines 333 to +334
if isStsPod, stsName, stsUID := isStatefulSetPod(pod); isStsPod {
if isStatefulSetPodToDel(c.config.KubeClient, pod, stsName, stsUID) {
if isStatefulSetPodToGC(c.config.KubeClient, pod, stsName, stsUID) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider whether renaming the function here is the best approach. While the new name isStatefulSetPodToGC is more descriptive, ensure that all call sites are updated and that the change doesn't introduce any unintended side effects. It might be useful to add a comment explaining why the function was renamed.

Suggested change
if isStsPod, stsName, stsUID := isStatefulSetPod(pod); isStsPod {
if isStatefulSetPodToDel(c.config.KubeClient, pod, stsName, stsUID) {
if isStatefulSetPodToGC(c.config.KubeClient, pod, stsName, stsUID) {
if isStsPod, stsName, stsUID := isStatefulSetPod(pod); isStsPod {
// Renamed from isStatefulSetPodToDel to better reflect its purpose: determining if the LSP should be garbage collected.
if isStatefulSetPodToGC(c.config.KubeClient, pod, stsName, stsUID) {

@coveralls
Copy link

Pull Request Test Coverage Report for Build 14247883375

Details

  • 0 of 57 (0.0%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 21.784%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controller/gc.go 0 1 0.0%
pkg/controller/pod.go 0 56 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/controller/pod.go 1 0.0%
Totals Coverage Status
Change from base Build 14234613651: -0.02%
Covered Lines: 10263
Relevant Lines: 47113

💛 - Coveralls

@zhangzujian zhangzujian marked this pull request as ready for review April 3, 2025 17:00
@zhangzujian zhangzujian requested a review from oilbeater April 3, 2025 17:00
@zhangzujian zhangzujian merged commit a38b004 into kubeovn:master Apr 4, 2025
68 of 69 checks passed
oilbeater pushed a commit that referenced this pull request Apr 4, 2025
@zhangzujian zhangzujian deleted the fix-lsp-gc branch April 8, 2025 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working need backport
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants