-
Notifications
You must be signed in to change notification settings - Fork 121
Introduce compiler timing script & CI job #4154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce compiler timing script & CI job #4154
Conversation
2d05ec4
to
c03d316
Compare
by separating EndToEnd and Compiler [only] benchmarks and ensuring they each get run on appropriate changes
df5fb7a
to
308c449
Compare
I also ran this workflow on an old, slightly performance-regressing PR (#3374), and it successfully detects the small regression with this output. |
I haven't reviewed the code yet, but a thought--I think we'll get more stable results out of this if we run on more LoC--the perf tests are a good starting point (and a way to get good feedback), but given that we're measuring compilation times I'd imagine the results would be more stable if we had more to compile. I'm not opposed to having a short job for quick feedback, though. Are you planning to add a longer one later? |
(I realize that I told you to start on the perf tests, so it's not a criticism 😁 ) I think we should also test on some bigger codebases too. |
d175652
to
03bf7b0
Compare
03bf7b0
to
44c4021
Compare
@carolynzech I've added an additional |
Here's the link to the newest CI run on my branch that has updated significance thresholds and small formatting changes. Should be finally ready for review now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Be sure to also update the PR description with whatever the new runs on your fork look like.
Here is a run of the workflow on a PR that we expect to cause a small-to-medium regression. I tried to create a change that isn't overly contrived. The results gives me confidence that the workflow is at a good spot with sensitivity--it catches the regression comfortably, without failing on every harness (which might suggest it's overly sensitive). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice comments 😁
9d87281
We have an existing `benchcomp` script to measure Kani's performance in CI, but it is prone to noise and only focuses on the end-to-end performance of compilation and verification together, leaving a need for more granular measurements. This script focuses solely on changes in the runtime of the Kani compiler and runs with warm ups, repeats and outlier detection (based on the rust compiler's method in [`rustc-perf`](https://github.com/rust-lang/rustc-perf)) in an attempt to limit noise. The new `compile-timer-short` CI job uses this script on a subset of the `perf` tests (currently excluding `s2n-quic`, `kani-lib/arbitrary` & `misc/display-trait`) to produce a `benchcomp`-like table comparing the compiler performance per-crate before and after a given commit. This also modifies our auto-labeller to ensure end-to-end benchmarks (like `benchcomp`) and the new compiler-specific ones are only run when the parts of Kani that they profile have changed. We manually tested the CI job on my personal fork (see [this regressing run](https://github.com/AlexanderPortland/kani/actions/runs/15788016660?pr=6) from a test PR that intentionally slows down the compiler). Resolves model-checking#2442 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.
Auto generated release notes: ## What's Changed * Edit quantifiers' documentation. by @thanhnguyen-aws in #4142 * Fix the bug of using multiple hidden variables for the prev of the same Expr by @thanhnguyen-aws in #4150 * Remove `assess` subcommand by @carolynzech in #4111 * Optimize goto binary exporting in `cprover_bindings` by @AlexanderPortland in #4148 * Add the option to generate performance flamegraphs by @AlexanderPortland in #4138 * Fix the bug: Static union values can panic Kani by @thanhnguyen-aws in #4112 * Update toolchain to 2025-06-13 by @carolynzech in #4152 * Automatic cargo update to 2025-06-16 by @github-actions in #4156 * Major-version update cargo dependencies by @tautschnig in #4158 * Upgrade Rust toolchain to 2025-06-16 by @tautschnig in #4157 * Bump tests/perf/s2n-quic from `3129ad5` to `c6e694e` by @dependabot in #4160 * Bump tests/perf/s2n-quic from `c6e694e` to `b1b5bf8` by @dependabot in #4164 * Upgrade Rust toolchain to 2025-06-17 by @tautschnig in #4163 * Automatic cargo update to 2025-06-23 by @github-actions in #4172 * Stub panics during MIR transformation by @AlexanderPortland in #4169 * Bump tests/perf/s2n-quic from `b1b5bf8` to `32ba87d` by @dependabot in #4175 * Handle enums with zero or one variants by @zhassan-aws in #4171 * Introduce compiler timing script & CI job by @AlexanderPortland in #4154 * Upgrade Rust toolchain to 2025-06-18 by @tautschnig in #4166 * Cache dependencies for CI jobs by @AlexanderPortland in #4181 * Autoharness: Derive `Arbitrary` for structs and enums by @carolynzech in #4167 * Upgrade Rust toolchain to 2025-06-27 by @tautschnig in #4182 * Include wget in dependencies by @zhassan-aws in #4183 * Automatic cargo update to 2025-06-30 by @github-actions in #4186 * Add support for loop assigns in loop contracts by @thanhnguyen-aws in #4174 * Upgrade toolchain to 06/30 by @carolynzech in #4188 * Optimize reachability with non-mutating global passes by @AlexanderPortland in #4177 * Bump tests/perf/s2n-quic from `32ba87d` to `b8f8cca` by @dependabot in #4190 * Bump ncipollo/release-action from 1.16.0 to 1.18.0 by @dependabot in #4191 * Upgrade toolchain to 07/02 by @carolynzech in #4195 * Automatic Derivation Fixes by @carolynzech in #4194 **Full Changelog**: kani-0.63.0...kani-0.64.0 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.
We have an existing
benchcomp
script to measure Kani's performance in CI, but it is prone to noise and only focuses on the end-to-end performance of compilation and verification together, leaving a need for more granular measurements.This script focuses solely on changes in the runtime of the Kani compiler and runs with warm ups, repeats and outlier detection (based on the rust compiler's method in
rustc-perf
) in an attempt to limit noise. The newcompile-timer-short
CI job uses this script on a subset of theperf
tests (currently excludings2n-quic
,kani-lib/arbitrary
&misc/display-trait
) to produce abenchcomp
-like table comparing the compiler performance per-crate before and after a given commit.This also modifies our auto-labeller to ensure end-to-end benchmarks (like
benchcomp
) and the new compiler-specific ones are only run when the parts of Kani that they profile have changed.We manually tested the CI job on my personal fork (see this regressing run from a test PR that intentionally slows down the compiler).
Resolves #2442
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.