Skip to content

Introduce compiler timing script & CI job #4154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 24, 2025

Conversation

AlexanderPortland
Copy link
Contributor

@AlexanderPortland AlexanderPortland commented Jun 14, 2025

We have an existing benchcomp script to measure Kani's performance in CI, but it is prone to noise and only focuses on the end-to-end performance of compilation and verification together, leaving a need for more granular measurements.

This script focuses solely on changes in the runtime of the Kani compiler and runs with warm ups, repeats and outlier detection (based on the rust compiler's method in rustc-perf) in an attempt to limit noise. The new compile-timer-short CI job uses this script on a subset of the perf tests (currently excluding s2n-quic, kani-lib/arbitrary & misc/display-trait) to produce a benchcomp-like table comparing the compiler performance per-crate before and after a given commit.

This also modifies our auto-labeller to ensure end-to-end benchmarks (like benchcomp) and the new compiler-specific ones are only run when the parts of Kani that they profile have changed.

We manually tested the CI job on my personal fork (see this regressing run from a test PR that intentionally slows down the compiler).

Resolves #2442

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@github-actions github-actions bot added the Z-EndToEndBenchCI Tag a PR to run benchmark CI label Jun 14, 2025
@AlexanderPortland AlexanderPortland force-pushed the compiler-timing-ci branch 2 times, most recently from 2d05ec4 to c03d316 Compare June 16, 2025 20:58
@AlexanderPortland AlexanderPortland marked this pull request as ready for review June 16, 2025 21:06
@AlexanderPortland AlexanderPortland requested a review from a team as a code owner June 16, 2025 21:06
by separating EndToEnd and Compiler [only] benchmarks and ensuring they each get run on appropriate changes
@AlexanderPortland
Copy link
Contributor Author

I also ran this workflow on an old, slightly performance-regressing PR (#3374), and it successfully detects the small regression with this output.

@carolynzech
Copy link
Contributor

I haven't reviewed the code yet, but a thought--I think we'll get more stable results out of this if we run on more LoC--the perf tests are a good starting point (and a way to get good feedback), but given that we're measuring compilation times I'd imagine the results would be more stable if we had more to compile.

I'm not opposed to having a short job for quick feedback, though. Are you planning to add a longer one later?

@carolynzech
Copy link
Contributor

(I realize that I told you to start on the perf tests, so it's not a criticism 😁 ) I think we should also test on some bigger codebases too.

@AlexanderPortland AlexanderPortland force-pushed the compiler-timing-ci branch 2 times, most recently from d175652 to 03bf7b0 Compare June 18, 2025 18:59
@AlexanderPortland
Copy link
Contributor Author

@carolynzech I've added an additional compiler-timing-long job that runs on all the s2n-quic harnesses and takes about ~90s for each full compilation run (compared to the -short job which only takes ~3s). This should reduce noise even more with the tradeoff of taking about 50 minutes to run in CI rather than 15.

@carolynzech carolynzech self-assigned this Jun 18, 2025
@AlexanderPortland
Copy link
Contributor Author

Here's the link to the newest CI run on my branch that has updated significance thresholds and small formatting changes. Should be finally ready for review now!

Copy link
Contributor

@carolynzech carolynzech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Be sure to also update the PR description with whatever the new runs on your fork look like.

@carolynzech
Copy link
Contributor

carolynzech commented Jun 20, 2025

Here is a run of the workflow on a PR that we expect to cause a small-to-medium regression. I tried to create a change that isn't overly contrived. The results gives me confidence that the workflow is at a good spot with sensitivity--it catches the regression comfortably, without failing on every harness (which might suggest it's overly sensitive).

@AlexanderPortland AlexanderPortland removed their assignment Jun 23, 2025
Copy link
Contributor

@carolynzech carolynzech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comments 😁

@carolynzech carolynzech added this pull request to the merge queue Jun 23, 2025
Merged via the queue into model-checking:main with commit 9d87281 Jun 24, 2025
25 of 26 checks passed
@carolynzech carolynzech deleted the compiler-timing-ci branch June 24, 2025 00:09
zhassan-aws added a commit to zhassan-aws/kani that referenced this pull request Jun 27, 2025
We have an existing `benchcomp` script to measure Kani's performance in
CI, but it is prone to noise and only focuses on the end-to-end
performance of compilation and verification together, leaving a need for
more granular measurements.

This script focuses solely on changes in the runtime of the Kani
compiler and runs with warm ups, repeats and outlier detection (based on
the rust compiler's method in
[`rustc-perf`](https://github.com/rust-lang/rustc-perf)) in an attempt
to limit noise. The new `compile-timer-short` CI job uses this script on
a subset of the `perf` tests (currently excluding `s2n-quic`,
`kani-lib/arbitrary` & `misc/display-trait`) to produce a
`benchcomp`-like table comparing the compiler performance per-crate
before and after a given commit.

This also modifies our auto-labeller to ensure end-to-end benchmarks
(like `benchcomp`) and the new compiler-specific ones are only run when
the parts of Kani that they profile have changed.

We manually tested the CI job on my personal fork (see [this regressing
run](https://github.com/AlexanderPortland/kani/actions/runs/15788016660?pr=6)
from a test PR that intentionally slows down the compiler).

Resolves model-checking#2442

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 and MIT licenses.
github-merge-queue bot pushed a commit that referenced this pull request Jul 3, 2025
Auto generated release notes:

## What's Changed
* Edit quantifiers' documentation. by @thanhnguyen-aws in
#4142
* Fix the bug of using multiple hidden variables for the prev of the
same Expr by @thanhnguyen-aws in
#4150
* Remove `assess` subcommand by @carolynzech in
#4111
* Optimize goto binary exporting in `cprover_bindings` by
@AlexanderPortland in #4148
* Add the option to generate performance flamegraphs by
@AlexanderPortland in #4138
* Fix the bug: Static union values can panic Kani by @thanhnguyen-aws in
#4112
* Update toolchain to 2025-06-13 by @carolynzech in
#4152
* Automatic cargo update to 2025-06-16 by @github-actions in
#4156
* Major-version update cargo dependencies by @tautschnig in
#4158
* Upgrade Rust toolchain to 2025-06-16 by @tautschnig in
#4157
* Bump tests/perf/s2n-quic from `3129ad5` to `c6e694e` by @dependabot in
#4160
* Bump tests/perf/s2n-quic from `c6e694e` to `b1b5bf8` by @dependabot in
#4164
* Upgrade Rust toolchain to 2025-06-17 by @tautschnig in
#4163
* Automatic cargo update to 2025-06-23 by @github-actions in
#4172
* Stub panics during MIR transformation by @AlexanderPortland in
#4169
* Bump tests/perf/s2n-quic from `b1b5bf8` to `32ba87d` by @dependabot in
#4175
* Handle enums with zero or one variants by @zhassan-aws in
#4171
* Introduce compiler timing script & CI job by @AlexanderPortland in
#4154
* Upgrade Rust toolchain to 2025-06-18 by @tautschnig in
#4166
* Cache dependencies for CI jobs by @AlexanderPortland in
#4181
* Autoharness: Derive `Arbitrary` for structs and enums by @carolynzech
in #4167
* Upgrade Rust toolchain to 2025-06-27 by @tautschnig in
#4182
* Include wget in dependencies by @zhassan-aws in
#4183
* Automatic cargo update to 2025-06-30 by @github-actions in
#4186
* Add support for loop assigns in loop contracts by @thanhnguyen-aws in
#4174
* Upgrade toolchain to 06/30 by @carolynzech in
#4188
* Optimize reachability with non-mutating global passes by
@AlexanderPortland in #4177
* Bump tests/perf/s2n-quic from `32ba87d` to `b8f8cca` by @dependabot in
#4190
* Bump ncipollo/release-action from 1.16.0 to 1.18.0 by @dependabot in
#4191
* Upgrade toolchain to 07/02 by @carolynzech in
#4195
* Automatic Derivation Fixes by @carolynzech in
#4194


**Full Changelog**:
kani-0.63.0...kani-0.64.0

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 and MIT licenses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Z-EndToEndBenchCI Tag a PR to run benchmark CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Measure compiler performance and overall execution time as part of our benchmarks
2 participants