Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a single listener for all timeout fixture invocations #12426

Merged

Conversation

gerashegalov
Copy link
Collaborator

@gerashegalov gerashegalov commented Apr 2, 2025

Resolves #12413
Resolves #12408

  • Make most of the state and the thread pool static
  • Ignore any exceptions when trying to register the timeout
  • Log an error once

Verified that the number of threads grows linearly without this PR and stays flat with this PR using

watch 'pidof java | xargs -n 1 ps -eL --pid | grep -c spark-job'

Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov gerashegalov added the bug Something isn't working label Apr 2, 2025
@gerashegalov gerashegalov self-assigned this Apr 2, 2025
@gerashegalov
Copy link
Collaborator Author

build

@pxLi pxLi requested review from Copilot, razajafri and abellina April 2, 2025 01:24
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the timeout listener setup used in integration tests to employ a single static listener instance, improving resource utilization and consistency.

  • The Python test initialization now sets global defaults and initializes the singleton listener.
  • The Java listener has been refactored to use static fields and includes a static initializer and unregister logic.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
integration_tests/src/main/python/spark_init_internal.py Introduces global default timeout settings and adjusts the timeout fixture to use these defaults.
integration_tests/src/main/java/org/apache/spark/rapids/tests/TimeoutSparkListener.java Refactors TimeoutSparkListener into a singleton with static fields and methods to manage listener registration and timeout settings.

@pxLi pxLi requested review from revans2 and tgravescs April 2, 2025 01:25
@gerashegalov gerashegalov added the test Only impacts tests label Apr 2, 2025
@gerashegalov
Copy link
Collaborator Author

build

Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov
Copy link
Collaborator Author

CI hitting SIGSEGV which affects other PRs is unrelated

@gerashegalov gerashegalov requested a review from razajafri April 3, 2025 20:47
Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov gerashegalov merged commit 1e6f154 into NVIDIA:branch-25.04 Apr 4, 2025
54 checks passed
@gerashegalov gerashegalov deleted the singleton-jt-listener branch April 4, 2025 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test Only impacts tests
Projects
None yet
3 participants