Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if/how "Verifying Peephole Rewriting In SSA Compiler IRs" can be used in HEIR #817

Open
AlexanderViand-Intel opened this issue Jul 23, 2024 · 10 comments
Labels
research synthesis Reading papers to figure out which ideas can be incorporated

Comments

@AlexanderViand-Intel
Copy link
Collaborator

AlexanderViand-Intel commented Jul 23, 2024

As some of you might be aware a group of researchers around Prof. Tobias Grosser from the University of Edinburgh/Cambridge University have been using (an earlier version of) our polynomial dialect as an example for their work on verifying MLIR in LEAN .

I'm happy to report that their paper1 has been accepted at ITP 2024 and is now available: https://doi.org/10.4230/LIPIcs.ITP.2024.9 or https://arxiv.org/abs/2407.03685 🎉

While this is already a cool result in itself, the natural question would be to see if we can somehow make use of this in HEIR (or MLIR, if it concerns up streamed polynomial things). For example, I wonder if something like this could have helped us catch #749 and/or might be useful as we go into more advanced modular arithmetic optimizations and transitions between poly and mod arith. The paper also implements reasoning about bit vectors, so there's a good chance it could capture such lowerings/rewrites.

Footnotes

  1. Siddharth Bhat, Alex Keizer, Chris Hughes, Andrés Goens, and Tobias Grosser.
    Verifying Peephole Rewriting in SSA Compiler IRs.
    In 15th International Conference on Interactive Theorem Proving (ITP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 309, pp. 9:1-9:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
    https://doi.org/10.4230/LIPIcs.ITP.2024.9

@AlexanderViand-Intel AlexanderViand-Intel added the research synthesis Reading papers to figure out which ideas can be incorporated label Jul 23, 2024
@AlexanderViand-Intel AlexanderViand-Intel changed the title Investigate if/how "Verifying Peephole Rewriting In SSA Compiler IRs" can be integrated into our CI Investigate if/how "Verifying Peephole Rewriting In SSA Compiler IRs" can be used in HEIR Jul 23, 2024
@AlexanderViand-Intel
Copy link
Collaborator Author

AlexanderViand-Intel commented Sep 9, 2024

Adding examples of issues/bugs that motivate why formal verification might be interesting for polynomial->arith lowerings:

@Maokami
Copy link
Collaborator

Maokami commented Mar 19, 2025

I would like to share that, although it may seem straightforward, the correctness of the canonicalization passes for the ModArith dialect (google/heir#1506) has been successfully verified through translation validation using the lean-mlir project:

https://github.com/Maokami/lean-mlir/blob/main/SSA/Projects/ModArith/Examples.lean

(Note: I relied heavily on existing code within lean-mlir.)

Though this is currently an elementary application, I found this tool very useful for defining and verifying the semantics/transformations of our compiler, and I look forward to exploring how we can extend it to cover more aspects of our project!

@j2kun
Copy link
Collaborator

j2kun commented Mar 20, 2025

@Maokami That is fantastic! Do you think it would make sense to incorporate the lean proofs into this repo, so that we can try to keep them in sync with the actual pattern implementations? (Or if this is part of your active research, maybe you would want to wait to publish?)

In regards to future work: At HACS a few years ago @AlexanderViand-Intel and another Alex tried hacking together a formal verification of Halevi-Shoup packing, and I feel like that (formally-verified the linalg kernels) would be a super valuable way to incorporate formal verification into HEIR. IIRC, the limitation with their demo was that z3 didn't have a good language for modeling vectors (e.g., a cyclic rotate), and so they had to bit-blast everything and it blew up the verification time.

I wonder if lean would handle that much easier. Given all the work that has gone into the linear algebra part of lean's mathlib, it seems like it should be able to handle simple vector operations in a dimension-independent manner.

@Maokami
Copy link
Collaborator

Maokami commented Mar 21, 2025

Thanks! :)

it would make sense to incorporate the lean proofs into this repo, so that we can try to keep them in sync with the actual pattern implementations

That sounds great, and I’m fully in favor of incorporating these Lean proofs into the repo. At this stage, this is more of a personal exploration driven by my interest in formal verification rather than an active research project, so I’d be happy to integrate them.

And do you have any thoughts on how we might structure this integration? One idea I’ve been thinking about is that, in the long term, it would be great to provide a translation validation tool (similar to how Alive2 works for LLVM). This would enable easy verification whenever someone develops a new compiler pass. (For instance, an alive proof in llvm/llvm-project#129947)

Also, thanks a lot for mentioning that future work! It sounds interesting, and I’d love to explore it further.
If you happen to have any pointers or recommended resources, I’d love to check them out!

@j2kun
Copy link
Collaborator

j2kun commented Mar 24, 2025

TIL about alive.

I think before figuring out an integration I need to understand a bit more about how the overall system is intended to function. Is it a completely parallel spec to the HEIR implementation, defining ops and semantics and the patterns in Lean? Or maybe the input-output examples are generated by running HEIR passes, and then the Lean proofs assert the results are equivalent according to the Lean-defined semantics?

If the latter, I could see us running the proofs against test cases in the repository as part of CI, using the actual HEIR passes to produce the input-output pairs to verify. I think the most robust integration would somehow tie the proof system to the implementation, and without having the rewrite patterns in lean be derived from the HEIR codebase, I don't see another easy way to do that. (By the way, do you know if there is a plan for lean-mlir to derive patterns from, say, tablegen or PDLL? I'd be happy to rewrite tablegen-based patterns in HEIR to PDLL if that would help)

I also like the idea of having proofs as part of an issue when creating a new pattern. Maybe lean-mlir.grosser.es could take the place of the alive2.llvm.org page as a first step in that direction?

@AlexanderViand-Intel
Copy link
Collaborator Author

This is so cool!! Thanks @Maokami!

I'm currently at HACS and will be trying to investigate with Tobias Grosser/some of the lean folks here how we could go about automating the extraction of rewrite rules into Lean-MLIR PeepholeRewrites, as a step towards getting lean-mlir integrated into CI, similar to Alive. Unfortunately, I won't be able to attend the HACS Hack day as it collides with the FHE.org event, but we can always do a "virtual hack day" with the lean-mlir folks some time after HACS/FHE/RWC.

@AlexanderViand-Intel
Copy link
Collaborator Author

AlexanderViand-Intel commented Mar 24, 2025

Ok, so the path forward for automatic extraction does indeed seem to be to express the Dialect in IRDL (upstream now has IRDL to C++) and the rewrite patterns in PDL

@Maokami
Copy link
Collaborator

Maokami commented Mar 24, 2025

@j2kun @AlexanderViand-Intel

Is it a completely parallel spec to the HEIR implementation, defining ops and semantics and the patterns in Lean? Or maybe the input-output examples are generated by running HEIR passes, and then the Lean proofs assert the results are equivalent according to the Lean-defined semantics?

It's the latter. Specifically, in example.lean, each MLIR program (test_*_RHS) is generated by applying heir-opt to test_*_LHS. I manually proved in Lean that these input-output pairs have equivalent semantics. More precisely, by extending and utilizing lean-mlir, I encoded the denotational semantics of operations and types from the mod_arith dialect into Lean's Zp mathematical structure. Using this encoding, I manually verified the equivalence Semantics{test_LHS} = Semantics{test_RHS} within Lean.

I could see us running the proofs against test cases in the repository as part of CI, using the actual HEIR passes to produce the input-output pairs to verify.

I completely agree. Currently, manually writing these proofs in Lean and running them against the test programs within the CI seems to be the best approach.

I think the most robust integration would somehow tie the proof system to the implementation, and without having the rewrite patterns in Lean be derived from the HEIR codebase, I don't see another easy way to do that. (By the way, do you know if there is a plan for lean-mlir to derive patterns from, say, tablegen or PDLL? I'd be happy to rewrite tablegen-based patterns in HEIR to PDLL if that would help.)

I wasn't aware of any plans to derive patterns into lean-mlir, but the automated extraction of rewrite rules into lean-mlir PeepholeRewrites, as mentioned by @AlexanderViand-Intel, seems promising for addressing this issue. I'm also highly interested in this topic and would gladly contribute if there's an opportunity! :)

I also like the idea of having proofs as part of an issue when creating a new pattern. Maybe lean-mlir.grosser.es could take the place of the alive2.llvm.org page as a first step in that direction?

Yes, that would be great! To make it happen, we’ll need to get in touch with Professor Grosser to see if we can extend the site to support custom dialect Lean semantics.

@AlexanderViand-Intel
Copy link
Collaborator Author

@Maokami I just showed @tobiasgrosser your ModArith lean mlir code and he said "Yes! This is amazing" so I think the next logical step would be to open up a PR against the lean-mlir repo to get this reviewed by someone from his group?

Tobias and I are also continuing to brainstorm on how to do automated extraction/CI stuff. I don't think we need a parallel definiton of the ops and rewrite rules, we just need to switch from using TableGen (ODS/DRR) as our "source of truth" in HEIR to actually using IRDL/PDL to do the defintions (because that's much easier to extract from then TableGen stuff)

@tobiasgrosser
Copy link

Yes, please pr this. 🙏

github-merge-queue bot pushed a commit to opencompl/lean-mlir that referenced this issue Apr 8, 2025
Hello lean-mlir team,

I'm very interested in translation validation/compiler verification
techniques for MLIR-based compilers, and I found your work truly
impressive and valuable—thank you for sharing it!

Recently, I used this project to validate a canonicalization pass I
contributed to the `ModArith` dialect (which corresponds to arithmetic
over Zp) in the [heir
project](google/heir#1506).

This PR contains the result of that effort. I’d appreciate it if you
could take a look and review it. Most of the implementation was
straightforward by referring to
`SSA/Projects/FullyHomomorphicEncryption`.

For more context, I’ve shared a detailed discussion here:
google/heir#817 (comment)

Thank you!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research synthesis Reading papers to figure out which ideas can be incorporated
Projects
None yet
Development

No branches or pull requests

4 participants