Skip to content

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

aseembits93
Copy link
Contributor

@aseembits93 aseembits93 commented Jun 7, 2025

PR Type

Enhancement, Tests


Description

  • Introduce custom pytest benchmarking plugin

  • Provide no-op benchmark fixture implementation

  • Register @pytest.mark.benchmark pytest marker

  • Include plugin in test runners’ PYTEST_PLUGINS settings


Changes walkthrough 📝

Relevant files
Tests
test_benchmark_bubble_sort.py
Add benchmark-marked test case                                                     

code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py

  • Added test_sort_with_marker using @pytest.mark.benchmark
  • Wrapped sorter call in an inner function decorated with @benchmark
  • Kept existing non-benchmark test unchanged
  • +7/-0     
    Enhancement
    custom_plugin.py
    Implement custom benchmark fixture plugin                               

    codeflash/benchmarking/plugin/custom_plugin.py

  • Created custom_plugin.py defining a CustomBenchmark class
  • Registered pytest marker "benchmark" in pytest_configure
  • Implemented no-op __call__, pedantic, and dynamic methods
  • Provided dummy properties: group, name, fullname, params, extra_info
  • +77/-0   
    Configuration changes
    test_runner.py
    Include custom benchmark plugin in test runners                   

    codeflash/verification/test_runner.py

  • Updated PYTEST_PLUGINS to include custom_plugin in three runners
  • Adjusted subprocess env for behavioral, line profile, benchmarking
    tests
  • +9/-3     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @aseembits93 aseembits93 marked this pull request as draft June 7, 2025 01:17
    Copy link

    github-actions bot commented Jun 7, 2025

    PR Reviewer Guide 🔍

    (Review updated until commit 3c76a95)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Weak Assertion

    The new test test_sort_with_marker uses assert 1==1 which doesn't validate the sorting functionality. Consider asserting the actual result of inner_fn to ensure correctness.

    @pytest.mark.benchmark
    def test_sort_with_marker(benchmark):
        @benchmark
        def inner_fn():
            return sorter(list(reversed(range(5))))
        assert 1==1
    Fixture Logic

    The benchmark fixture always returns the custom no-op implementation, making the marker check redundant. Verify that the real pytest-benchmark fixture is correctly returned when available.

    # Check if benchmark fixture is already available (pytest-benchmark is active)
    if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"):
        try:
            # Try to get the real benchmark fixture
            return request.getfixturevalue("benchmark")
        except (pytest.FixtureLookupError, AttributeError):
            pass
    custom_benchmark = CustomBenchmark()
    if request.node.get_closest_marker("benchmark"):
        # Return our custom benchmark for tests marked with @pytest.mark.benchmark
        return custom_benchmark
    return custom_benchmark
    Configuration Duplication

    The PYTEST_PLUGINS assignment is duplicated in multiple test runner functions. Consider extracting it into a shared constant or helper to reduce repetition and simplify maintenance.

    pytest_test_env["PYTEST_PLUGINS"] = (
        "codeflash.verification.pytest_plugin,codeflash.benchmarking.plugin.custom_plugin"
    )

    Copy link

    github-actions bot commented Jun 7, 2025

    PR Code Suggestions ✨

    Latest suggestions up to 3c76a95

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Execute and validate benchmark function

    Ensure the inner benchmark function is actually executed and its output validated.
    Call inner_fn() and assert that it returns the expected sorted list instead of a
    placeholder assertion.

    code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [10-15]

     @pytest.mark.benchmark
     def test_sort_with_marker(benchmark):
         @benchmark
         def inner_fn():
             return sorter(list(reversed(range(5))))
    -    assert 1==1
    +    result = inner_fn()
    +    assert result == list(range(5))
    Suggestion importance[1-10]: 8

    __

    Why: The test currently asserts a placeholder 1==1 without invoking inner_fn(), so calling it and checking the sorted result fixes test correctness.

    Medium
    Detect plugin with pluginmanager

    Use pytest's plugin manager to detect if the benchmark plugin is active, avoiding
    reliance on request.fixturenames. Replace the check with
    request.config.pluginmanager.hasplugin("pytest_benchmark") before retrieving the
    real fixture.

    codeflash/benchmarking/plugin/custom_plugin.py [67-72]

    -if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"):
    +if request.config.pluginmanager.hasplugin("pytest_benchmark"):
         try:
    -        # Try to get the real benchmark fixture
             return request.getfixturevalue("benchmark")
         except (pytest.FixtureLookupError, AttributeError):
             pass
    Suggestion importance[1-10]: 6

    __

    Why: Using pluginmanager.hasplugin("pytest_benchmark") is a more reliable way to detect the real benchmark fixture than inspecting request.fixturenames.

    Low
    General
    Add iteration loop in pedantic

    Implement iteration and round loops to more closely mimic pytest-benchmark’s
    pedantic. Loop over iterations (and optionally rounds) calling target, and return
    the last result.

    codeflash/benchmarking/plugin/custom_plugin.py [24-39]

     def pedantic(
         self,
         target: Callable,
         args: tuple = (),
         kwargs: Optional[dict] = None,
         iterations: int = 1,
         rounds: int = 1,
         warmup_rounds: int = 0,
         setup: Optional[Callable] = None,
     ) -> Any:
         """Mimics the pedantic method of pytest-benchmark."""
         if setup:
             setup()
         if kwargs is None:
             kwargs = {}
    -    return target(*args, **kwargs)
    +    result = None
    +    for _ in range(iterations):
    +        result = target(*args, **kwargs)
    +    return result
    Suggestion importance[1-10]: 6

    __

    Why: Looping over iterations in pedantic better mimics pytest-benchmark behavior and corrects the current single-call implementation.

    Low
    Support chained no-op calls

    Allow chained calls on the custom benchmark by returning self instead of None from
    no-op methods. This supports patterns like benchmark.timer.start().stop().

    codeflash/benchmarking/plugin/custom_plugin.py [20-22]

     def __getattr__(self, name):
    -    # Return a no-op callable for any attribute
    -    return lambda *args, **kwargs: None
    +    # Return a no-op callable for any attribute that returns self
    +    return lambda *args, **kwargs: self
    Suggestion importance[1-10]: 4

    __

    Why: Allowing __getattr__ to return self enables chained no-op calls but only marginally improves usability.

    Low

    Previous suggestions

    Suggestions up to commit aa33d77
    CategorySuggestion                                                                                                                                    Impact
    General
    Implement full pedantic loops

    To better emulate pytest-benchmark’s pedantic, run the warmup, iterations, and
    rounds loops instead of calling the target only once. Aggregate and return the last
    result.

    codeflash/benchmarking/plugin/custom_plugin.py [20-35]

     def pedantic(
         self,
         target: Callable,
         args: tuple = (),
    -    kwargs: Optional[dict] = None,  # noqa: FA100
    -    iterations: int = 1,  # noqa: ARG002
    -    rounds: int = 1,  # noqa: ARG002
    -    warmup_rounds: int = 0,  # noqa: ARG002
    -    setup: Optional[Callable] = None,  # noqa: FA100
    -) -> Any:  # noqa: ANN401
    +    kwargs: Optional[dict] = None,
    +    iterations: int = 1,
    +    rounds: int = 1,
    +    warmup_rounds: int = 0,
    +    setup: Optional[Callable] = None,
    +) -> Any:
         """Mimics the pedantic method of pytest-benchmark."""
    -    if setup:
    -        setup()
         if kwargs is None:
             kwargs = {}
    -    return target(*args, **kwargs)
    +    # warmup
    +    for _ in range(warmup_rounds):
    +        if setup:
    +            setup()
    +        target(*args, **kwargs)
    +    # timed loops
    +    result = None
    +    for _ in range(rounds):
    +        if setup:
    +            setup()
    +        for _ in range(iterations):
    +            result = target(*args, **kwargs)
    +    return result
    Suggestion importance[1-10]: 7

    __

    Why: Adding the warmup, iterations, and rounds loops mirrors pytest-benchmark’s intended behavior and correctly applies to the existing pedantic definition without breaking other logic.

    Medium
    Use plugin manager to detect benchmark plugin

    Rely on pytest’s plugin manager to detect the real benchmark plugin instead of
    inspecting request.fixturenames, which can cause recursion. This ensures you only
    override when the pytest-benchmark plugin is not installed.

    codeflash/benchmarking/plugin/custom_plugin.py [63-68]

    -if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"):
    -    try:
    -        # Try to get the real benchmark fixture
    -        return request.getfixturevalue("benchmark")
    -    except (pytest.FixtureLookupError, AttributeError):
    -        pass
    +if not request.config.pluginmanager.hasplugin("benchmark"):
    +    return CustomBenchmark()
    +return request.getfixturevalue("benchmark")
    Suggestion importance[1-10]: 6

    __

    Why: Relying on pluginmanager.hasplugin("benchmark") is a cleaner way to detect the real plugin, but the proposed snippet drops the existing exception fallback which could lead to uncaught errors if the fixture isn’t registered.

    Low
    Suggestions up to commit 56108b5
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix DummyBenchmark instantiation

    The DummyBenchmark constructor does not accept any arguments, so instantiate it
    without passing request to avoid a TypeError.

    codeflash/benchmarking/plugin/dummy_plugin.py [85]

    -return CodeFlashBenchmarkDummyPlugin.DummyBenchmark(request)
    +return CodeFlashBenchmarkDummyPlugin.DummyBenchmark()
    Suggestion importance[1-10]: 9

    __

    Why: Removing the request argument fixes a real TypeError and restores proper instantiation of DummyBenchmark.

    High
    General
    Simulate iteration loops

    Simulate the configured iterations and rounds to better mimic pytest-benchmark's
    behavior by looping the target call accordingly and returning the last result.

    code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [17-25]

     def pedantic(self, target: Callable, args: tuple = (), kwargs: dict = None,
                  iterations: int = 1, rounds: int = 1, warmup_rounds: int = 0,
                  setup: Callable = None) -> Any:
         """Mimics the pedantic method of pytest-benchmark."""
         if setup:
             setup()
         if kwargs is None:
             kwargs = {}
    -    return target(*args, **kwargs)
    +    result = None
    +    for _ in range(rounds):
    +        for _ in range(iterations):
    +            result = target(*args, **kwargs)
    +    return result
    Suggestion importance[1-10]: 5

    __

    Why: Adding loops for iterations and rounds correctly improves fidelity to pytest-benchmark without introducing errors.

    Low

    @aseembits93 aseembits93 changed the title Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is diabled Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled Jun 9, 2025
    @aseembits93 aseembits93 requested a review from KRRT7 June 9, 2025 23:00
    @aseembits93 aseembits93 marked this pull request as ready for review June 9, 2025 23:03
    Copy link

    github-actions bot commented Jun 9, 2025

    Persistent review updated to latest commit aa33d77

    @aseembits93 aseembits93 marked this pull request as draft June 10, 2025 01:55
    @aseembits93 aseembits93 marked this pull request as ready for review June 10, 2025 23:20
    Copy link

    Persistent review updated to latest commit 3c76a95

    @@ -252,7 +256,9 @@ def run_benchmarking_tests(
    result_file_path = get_run_tmp_file(Path("pytest_results.xml"))
    result_args = [f"--junitxml={result_file_path.as_posix()}", "-o", "junit_logging=all"]
    pytest_test_env = test_env.copy()
    pytest_test_env["PYTEST_PLUGINS"] = "codeflash.verification.pytest_plugin"
    pytest_test_env["PYTEST_PLUGINS"] = (
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    why do we need the benchmarking plugin here?

    @aseembits93
    Copy link
    Contributor Author

    closing for now as there are other ways of achieving the same goal

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants