Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

aseembits93 · 2025-06-07T01:17:43Z

PR Type

Enhancement, Tests

Description

Introduce custom pytest benchmarking plugin
Provide no-op benchmark fixture implementation
Register @pytest.mark.benchmark pytest marker
Include plugin in test runners’ PYTEST_PLUGINS settings

Changes walkthrough 📝

Relevant files

Tests

test_benchmark_bubble_sort.py `Add benchmark-marked test case` code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py Added `test_sort_with_marker` using `@pytest.mark.benchmark` Wrapped sorter call in an inner function decorated with `@benchmark` Kept existing non-benchmark test unchanged	+7/-0

Enhancement

custom_plugin.py `Implement custom benchmark fixture plugin` codeflash/benchmarking/plugin/custom_plugin.py Created `custom_plugin.py` defining a CustomBenchmark class Registered pytest marker `"benchmark"` in `pytest_configure` Implemented no-op `__call__`, `pedantic`, and dynamic methods Provided dummy properties: `group`, `name`, `fullname`, `params`, `extra_info`	+77/-0

Configuration changes

test_runner.py `Include custom benchmark plugin in test runners` codeflash/verification/test_runner.py Updated `PYTEST_PLUGINS` to include `custom_plugin` in three runners Adjusted subprocess env for behavioral, line profile, benchmarking tests	+9/-3

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

github-actions · 2025-06-07T01:18:50Z

PR Reviewer Guide 🔍

(Review updated until commit `3c76a95`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Weak Assertion The new test `test_sort_with_marker` uses `assert 1==1` which doesn't validate the sorting functionality. Consider asserting the actual result of `inner_fn` to ensure correctness. @pytest.mark.benchmark def test_sort_with_marker(benchmark): @benchmark def inner_fn(): return sorter(list(reversed(range(5)))) assert 1==1 Fixture Logic The `benchmark` fixture always returns the custom no-op implementation, making the marker check redundant. Verify that the real pytest-benchmark fixture is correctly returned when available. # Check if benchmark fixture is already available (pytest-benchmark is active) if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"): try: # Try to get the real benchmark fixture return request.getfixturevalue("benchmark") except (pytest.FixtureLookupError, AttributeError): pass custom_benchmark = CustomBenchmark() if request.node.get_closest_marker("benchmark"): # Return our custom benchmark for tests marked with @pytest.mark.benchmark return custom_benchmark return custom_benchmark Configuration Duplication The `PYTEST_PLUGINS` assignment is duplicated in multiple test runner functions. Consider extracting it into a shared constant or helper to reduce repetition and simplify maintenance. pytest_test_env["PYTEST_PLUGINS"] = ( "codeflash.verification.pytest_plugin,codeflash.benchmarking.plugin.custom_plugin" )

github-actions · 2025-06-07T01:20:05Z

PR Code Suggestions ✨

Latest suggestions up to 3c76a95

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Execute and validate benchmark function Ensure the inner benchmark function is actually executed and its output validated. Call `inner_fn()` and assert that it returns the expected sorted list instead of a placeholder assertion. code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [10-15] @pytest.mark.benchmark def test_sort_with_marker(benchmark): @benchmark def inner_fn(): return sorter(list(reversed(range(5)))) - assert 1==1 + result = inner_fn() + assert result == list(range(5)) Suggestion importance[1-10]: 8 __ Why: The test currently asserts a placeholder `1==1` without invoking `inner_fn()`, so calling it and checking the sorted result fixes test correctness.	Medium
Possible issue	Detect plugin with pluginmanager Use pytest's plugin manager to detect if the benchmark plugin is active, avoiding reliance on `request.fixturenames`. Replace the check with `request.config.pluginmanager.hasplugin("pytest_benchmark")` before retrieving the real fixture. codeflash/benchmarking/plugin/custom_plugin.py [67-72] -if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"): +if request.config.pluginmanager.hasplugin("pytest_benchmark"): try: - # Try to get the real benchmark fixture return request.getfixturevalue("benchmark") except (pytest.FixtureLookupError, AttributeError): pass Suggestion importance[1-10]: 6 __ Why: Using `pluginmanager.hasplugin("pytest_benchmark")` is a more reliable way to detect the real benchmark fixture than inspecting `request.fixturenames`.	Low
General	Add iteration loop in pedantic Implement iteration and round loops to more closely mimic pytest-benchmark’s `pedantic`. Loop over `iterations` (and optionally `rounds`) calling `target`, and return the last result. codeflash/benchmarking/plugin/custom_plugin.py [24-39] def pedantic( self, target: Callable, args: tuple = (), kwargs: Optional[dict] = None, iterations: int = 1, rounds: int = 1, warmup_rounds: int = 0, setup: Optional[Callable] = None, ) -> Any: """Mimics the pedantic method of pytest-benchmark.""" if setup: setup() if kwargs is None: kwargs = {} - return target(args, kwargs) + result = None + for _ in range(iterations): + result = target(args, **kwargs) + return result Suggestion importance[1-10]: 6 __ Why: Looping over `iterations` in `pedantic` better mimics pytest-benchmark behavior and corrects the current single-call implementation.	Low
General	Support chained no-op calls Allow chained calls on the custom benchmark by returning `self` instead of `None` from no-op methods. This supports patterns like `benchmark.timer.start().stop()`. codeflash/benchmarking/plugin/custom_plugin.py [20-22] def __getattr__(self, name): - # Return a no-op callable for any attribute - return lambda args, kwargs: None + # Return a no-op callable for any attribute that returns self + return lambda args, **kwargs: self Suggestion importance[1-10]: 4 __ Why: Allowing `__getattr__` to return `self` enables chained no-op calls but only marginally improves usability.	Low

Previous suggestions

Suggestions up to commit aa33d77

Category Suggestion Impact

General

Implement full pedantic loops

To better emulate pytest-benchmark’s pedantic, run the warmup, iterations, and
rounds loops instead of calling the target only once. Aggregate and return the last
result.

codeflash/benchmarking/plugin/custom_plugin.py [20-35]

 def pedantic(
     self,
     target: Callable,
     args: tuple = (),
-    kwargs: Optional[dict] = None,  # noqa: FA100
-    iterations: int = 1,  # noqa: ARG002
-    rounds: int = 1,  # noqa: ARG002
-    warmup_rounds: int = 0,  # noqa: ARG002
-    setup: Optional[Callable] = None,  # noqa: FA100
-) -> Any:  # noqa: ANN401
+    kwargs: Optional[dict] = None,
+    iterations: int = 1,
+    rounds: int = 1,
+    warmup_rounds: int = 0,
+    setup: Optional[Callable] = None,
+) -> Any:
     """Mimics the pedantic method of pytest-benchmark."""
-    if setup:
-        setup()
     if kwargs is None:
         kwargs = {}
-    return target(*args, **kwargs)
+    # warmup
+    for _ in range(warmup_rounds):
+        if setup:
+            setup()
+        target(*args, **kwargs)
+    # timed loops
+    result = None
+    for _ in range(rounds):
+        if setup:
+            setup()
+        for _ in range(iterations):
+            result = target(*args, **kwargs)
+    return result

Suggestion importance[1-10]: 7

__

Why: Adding the warmup, iterations, and rounds loops mirrors pytest-benchmark’s intended behavior and correctly applies to the existing pedantic definition without breaking other logic.

Medium

Use plugin manager to detect benchmark plugin

Rely on pytest’s plugin manager to detect the real benchmark plugin instead of
inspecting request.fixturenames, which can cause recursion. This ensures you only
override when the pytest-benchmark plugin is not installed.

codeflash/benchmarking/plugin/custom_plugin.py [63-68]

-if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"):
-    try:
-        # Try to get the real benchmark fixture
-        return request.getfixturevalue("benchmark")
-    except (pytest.FixtureLookupError, AttributeError):
-        pass
+if not request.config.pluginmanager.hasplugin("benchmark"):
+    return CustomBenchmark()
+return request.getfixturevalue("benchmark")

Suggestion importance[1-10]: 6

__

Why: Relying on pluginmanager.hasplugin("benchmark") is a cleaner way to detect the real plugin, but the proposed snippet drops the existing exception fallback which could lead to uncaught errors if the fixture isn’t registered.

Low

Suggestions up to commit 56108b5

Category	Suggestion	Impact
Possible issue	Fix DummyBenchmark instantiation The `DummyBenchmark` constructor does not accept any arguments, so instantiate it without passing `request` to avoid a `TypeError`. codeflash/benchmarking/plugin/dummy_plugin.py [85] -return CodeFlashBenchmarkDummyPlugin.DummyBenchmark(request) +return CodeFlashBenchmarkDummyPlugin.DummyBenchmark() Suggestion importance[1-10]: 9 __ Why: Removing the `request` argument fixes a real `TypeError` and restores proper instantiation of `DummyBenchmark`.	High
General	Simulate iteration loops Simulate the configured `iterations` and `rounds` to better mimic pytest-benchmark's behavior by looping the target call accordingly and returning the last result. code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [17-25] def pedantic(self, target: Callable, args: tuple = (), kwargs: dict = None, iterations: int = 1, rounds: int = 1, warmup_rounds: int = 0, setup: Callable = None) -> Any: """Mimics the pedantic method of pytest-benchmark.""" if setup: setup() if kwargs is None: kwargs = {} - return target(args, kwargs) + result = None + for _ in range(rounds): + for _ in range(iterations): + result = target(args, **kwargs) + return result Suggestion importance[1-10]: 5 __ Why: Adding loops for `iterations` and `rounds` correctly improves fidelity to pytest-benchmark without introducing errors.	Low

github-actions · 2025-06-09T23:04:53Z

Persistent review updated to latest commit aa33d77

github-actions · 2025-06-10T23:21:59Z

Persistent review updated to latest commit 3c76a95

misrasaurabh1 · 2025-06-11T01:05:04Z

codeflash/verification/test_runner.py

@@ -252,7 +256,9 @@ def run_benchmarking_tests(
        result_file_path = get_run_tmp_file(Path("pytest_results.xml"))
        result_args = [f"--junitxml={result_file_path.as_posix()}", "-o", "junit_logging=all"]
        pytest_test_env = test_env.copy()
-        pytest_test_env["PYTEST_PLUGINS"] = "codeflash.verification.pytest_plugin"
+        pytest_test_env["PYTEST_PLUGINS"] = (


why do we need the benchmarking plugin here?

aseembits93 · 2025-06-11T04:10:32Z

closing for now as there are other ways of achieving the same goal

need to fix precommit errors

56108b5

aseembits93 marked this pull request as draft June 7, 2025 01:17

github-actions bot added the Review effort 2/5 label Jun 7, 2025

aseembits93 added the openhands label Jun 9, 2025

aseembits93 added 3 commits June 9, 2025 12:50

rename

5b68810

works

c92da49

bugfix

5220b5e

aseembits93 changed the title ~~Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is diabled~~ Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled Jun 9, 2025

aseembits93 requested a review from KRRT7 June 9, 2025 23:00

aseembits93 marked this pull request as ready for review June 9, 2025 23:03

Merge branch 'main' into benchmark-fixture

aa33d77

ready to review

a4b204d

aseembits93 marked this pull request as draft June 10, 2025 01:55

aseembits93 marked this pull request as ready for review June 10, 2025 23:20

Merge branch 'main' into benchmark-fixture

3c76a95

aseembits93 requested a review from Saga4 June 10, 2025 23:21

github-actions bot added Review effort 3/5 and removed Review effort 2/5 labels Jun 10, 2025

aseembits93 removed the openhands label Jun 10, 2025

aseembits93 requested a review from misrasaurabh1 June 10, 2025 23:53

misrasaurabh1 reviewed Jun 11, 2025

View reviewed changes

aseembits93 closed this Jun 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

Uh oh!

aseembits93 commented Jun 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 9, 2025

Uh oh!

github-actions bot commented Jun 10, 2025

Uh oh!

misrasaurabh1 Jun 11, 2025

Uh oh!

aseembits93 commented Jun 11, 2025

Uh oh!

Uh oh!

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

Uh oh!

Conversation

aseembits93 commented Jun 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Changes walkthrough 📝

Uh oh!

github-actions bot commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 3c76a95)

Uh oh!

github-actions bot commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Jun 9, 2025

Uh oh!

github-actions bot commented Jun 10, 2025

Uh oh!

misrasaurabh1 Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

aseembits93 commented Jun 11, 2025

Uh oh!

Uh oh!

aseembits93 commented Jun 7, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jun 7, 2025 •

edited

Loading

(Review updated until commit `3c76a95`)

github-actions bot commented Jun 7, 2025 •

edited

Loading