Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using function contracts to verify insertion into HashMap runs into OOM (>8GB) #3965

Open
janriemer opened this issue Mar 29, 2025 · 7 comments
Assignees
Labels
[C] Feature / Enhancement A new feature request or enhancement to an existing feature. [E] Performance Track performance improvement (Time / Memory / CPU) [E] User Experience An UX enhancement for an existing feature. Including deprecation of an existing one. T-User Tag user issues / requests

Comments

@janriemer
Copy link

Hi all,

Goal that I try to achieve

I'm trying to verify the diffing algorithm in csv-diff with kani.
In the algorithm itself, a lot of operations on hash maps are happening, mainly insertions and/or updates.

Problem I run into

When trying to impl a kani proof for it (first without function contracts), memory usage is fine (~6GB), but after letting it run for over 1,5h, I killed it, because the algorithm seems to be too complicated to verify it fully automatically.

So I resort to kani's function contracts in the hope of composing smaller proofs together (via proof_for_contract and stub_verified).

In order to "test the waters", I've tried to proof a very simple "dummy" function that adds/updates a single hash map entry with a function contract, but unfortunately when executing the proof_for_contract, even this tiny example blew up my memory:

=> I only have 8GB RAM available, but those 8GB were filled very quickly during the execution.

So my question is: is it normal that this much RAM is required for function contracts? Trying to verify the same "dummy" function with normal proof harness finishes very fast and doesn't require lots of RAM.
I guess it might have something to do with the modifies clause not being properly constraint (in case of proof_for_contract), but also I don't know how to constrain it any further (see code below).

Any help is greatly appreciated!❤️

Code to reproduce

I tried this code (function contract on the "dummy" function that does some hash map operation):

struct Foo(HashMap<u64, u8>);

impl Foo {
    pub fn new() -> Self {
        Self(Default::default())
    }

    // not sure how to restrict this further => hash map internals are complicated!
    #[cfg_attr(kani, kani::modifies(&mut self.0))]
    // try to restrict input to manageable size
    #[cfg_attr(kani, kani::requires(num <= 10 && bar <= 100))]
    #[cfg_attr(kani, kani::ensures(|result| self.0.get(&num) == Some(&bar)))]
    pub fn add(&mut self, num: u64, bar: u8) {
        *self.0.entry(num).or_insert(bar) = bar;
    }
}

#[cfg(kani)]
mod verification {
    use std::{
        collections::HashSet,
        hash::{BuildHasher, DefaultHasher, RandomState},
        mem::transmute,
    };

    #[allow(dead_code)]
    fn concrete_state() -> RandomState {
        let keys: [u64; 2] = Default::default();
        assert_eq!(size_of_val(&keys), size_of::<RandomState>());
        unsafe { transmute(keys) }
    }

    #[kani::proof_for_contract(Foo::add)]
    #[kani::stub(RandomState::new, concrete_state)]
    #[kani::unwind(4)]
    pub fn verify_contract_foo_add() {
        let mut foo = Foo::new();
        foo.add(kani::any(), kani::any());
    }
}

using the following command line invocation:

cargo kani --harness verify_contract_foo_add -Z function-contracts

with Kani version: 0.58

@janriemer janriemer added the [C] Bug This is a bug. Something isn't working. label Mar 29, 2025
@carolynzech
Copy link
Contributor

Hi! We're glad you're using Kani. I just started looking into this--I'll circle back with some suggestions once I've had a chance to dig in further.

@carolynzech carolynzech self-assigned this Apr 1, 2025
@janriemer
Copy link
Author

janriemer commented Apr 3, 2025

Thank you @carolynzech for looking into this!🙏

In the meantime I've set up a (for now private) GitHub Codespace where I have 16GB RAM available.

When running the function contract in the OP with the GitHub Codespace, it failed with a CBMC error code 15 when system reached ~15GB (not quite sure how much RAM exactly was used until the error happened, cause I used top command to track mem usage and refresh interval of top is 1s I think).

I couldn't find anything on the internet regarding CBMC erroring with code 15.
I assume, it just means OOM exception!?

Note: I haven't gotten this error, when running on my local machine (with 8GB). Maybe it is because my local machine has a swap partition and GitHub Codespace has no swap? However, regarding no swap of GitHub Codespace, this is just guessing from my side.

Please let me know, if I can assist any further or should provide you with more information.

Really appreciate your effort!❤

@carolynzech
Copy link
Contributor

Thanks for the info. CBMC provides its exit codes in this file: https://github.com/diffblue/cbmc/blob/fab67109ec0c3132b41a95bfe9c503f71fe9b329/src/util/exit_codes.h#L10

and out of memory is exit code 6, so that is indeed confusing. It's likely an error code returned by the system that killed CBMC, rather than an error code from CBMC itself.

I tried to improve performance on your example, but it is generally very hard to get HashMaps and contracts to play nicely together. There's work we can (and should) do to make memory/runtime improvements there, but those are unfortunately nontrivial and not something we'll have done anytime super soon.

I have two suggestions for things you can try:

  1. Break up the algorithm into smaller functions, and then using regular #[kani::proof]s with assumes/asserts on those. So for example:
#[kani::requires(x < 10)]
#[kani::ensures(|result| *result == 100)]
fn foo(x: u8) -> u8 { ... }

#[kani::proof_for_contract(foo)]
fn foo_harness() {
  let x: u8 = kani::any();
  foo(x);
}

is equivalent to:

fn foo(x: u8) -> u8 { ... }

#[kani::proof]
fn foo_harness() {
  let x: u8 = kani::any();
  kani::assume(x < 10);
  let result = foo(x);
  kani::assert(result == 100);
}

If foo_harness passes, you can assume the postcondition in a later proof without causing unsoundness:

fn bar_harness() {
   let x: u8 = kani::any();
   let foo_result = foo(x);
   kani::assume(foo_result == 100);
   bar(foo_result);
}
  1. If you can try to think of another way to represent the algorithm that doesn't require HashMaps (like a simple vector), that may help, since the hashing logic is difficult for the symbolic executor to reason about.

Generally speaking, symbolic execution with HashMaps is a difficult problem, so while we definitely want to improve Kani's performance here, those improvements may not come quickly enough for you to make use of them very soon. Please let us know if there's anything else we can help with, or if there's a breakdown of the proof approach that you'd like us to look at.

@carolynzech carolynzech added [E] Performance Track performance improvement (Time / Memory / CPU) [C] Feature / Enhancement A new feature request or enhancement to an existing feature. [E] User Experience An UX enhancement for an existing feature. Including deprecation of an existing one. T-User Tag user issues / requests and removed [C] Bug This is a bug. Something isn't working. labels Apr 3, 2025
@carolynzech
Copy link
Contributor

@janriemer You could also try using Bolero (https://github.com/camshaft/bolero).

@janriemer
Copy link
Author

janriemer commented Apr 6, 2025

Wow, thank you @carolynzech for looking into it so deeply!❤

Thanks for the info. CBMC provides its exit codes in this file: https://github.com/diffblue/cbmc/blob/fab67109ec0c3132b41a95bfe9c503f71fe9b329/src/util/exit_codes.h#L10

and out of memory is exit code 6, so that is indeed confusing. It's likely an error code returned by the system that killed CBMC, rather than an error code from CBMC itself.

Hm...interesting. Thank you for pointing me to the error codes. Maybe it is because of the docker container in my Codespace not having all the necessary dependencies CBMC requires!? Anyway, this seems to be a separate issue then and we don't need to look into it for now.

I tried to improve performance on your example, but it is generally very hard to get HashMaps and contracts to play nicely together. There's work we can (and should) do to make memory/runtime improvements there, but those are unfortunately nontrivial and not something we'll have done anytime super soon.

I have two suggestions for things you can try:

  1. Break up the algorithm into smaller functions, and then using regular #[kani::proof]s with assumes/asserts on those. So for example:

[...]

If foo_harness passes, you can assume the postcondition in a later proof without causing unsoundness:

[...]

Ohhh, I didn't know you could do that! This is brilliant! I'll definitely try this out. Thank you!😊

  1. If you can try to think of another way to represent the algorithm that doesn't require HashMaps (like a simple vector), that may help, since the hashing logic is difficult for the symbolic executor to reason about.

Yeah, this is an interesting idea.👍 Unfortunately, a primary goal of the algorithm is performance and I'd like to prevent regressions in that regard.
An alternative to HashMap or Vec might be BTreeMaps!? Not sure what the perf impact on the algorithm itself will be, but Kani might be able to handle BTreeMaps better than HashMaps!?

@janriemer You could also try using Bolero (https://github.com/camshaft/bolero).

Ah, yes, I've already read about it in the kani docs and on Kani's blog. If my approach with Kani should fail, I'll definitely look into fuzzing and/or property based testing.

Generally speaking, symbolic execution with HashMaps is a difficult problem, so while we definitely want to improve Kani's performance here, those improvements may not come quickly enough for you to make use of them very soon. Please let us know if there's anything else we can help with, or if there's a breakdown of the proof approach that you'd like us to look at.

You are so kind! Thank you so much for your elaborate answer and for providing such great support!❤

Your answer has really helped me to get "unblocked", because:

  • I now know that verfiying algorithms with symbolic execution involving HashMaps is generally difficult, so I "wasn't doing anything wrong"
  • I have some more alternative approaches I can try now

For me Kani is a technology that is as close to magic as one can possibly imagine!✨ Thank you for maintaining such an awesome tool!

@carolynzech
Copy link
Contributor

@janriemer

Yeah, this is an interesting idea.👍 Unfortunately, a primary goal of the algorithm is performance and I'd like to prevent regressions in that regard.

Fair point, although you can separate the verification of the whole algorithm from the verification of the part that actually performs the lookup. Put differently, going back to my suggestion to break up the verification into multiple methods, you can imagine a division of labor like this:

fn lookup_value(map: HashMap<...>) { ... } -> ...
fn do_something_with_value(val: ...) { ... }

where do_something_with_value assumes the HashMap lookup has already happened and just does something with the result. This of course doesn't give you the complete end-to-end verification that you're looking for, but you could at least get a proof to the effect of "assuming that the hashmap lookups return the right values, the rest of the algorithm performs as expected."

You may also want to look at Verus, another tool for verifying the correctness of Rust code.

@janriemer
Copy link
Author

janriemer commented Apr 6, 2025

Fair point, although you can separate the verification of the whole algorithm from the verification of the part that actually performs the lookup. Put differently, going back to my suggestion to break up the verification into multiple methods, you can imagine a division of labor like this:

fn lookup_value(map: HashMap<...>) { ... } -> ...
fn do_something_with_value(val: ...) { ... }

where do_something_with_value assumes the HashMap lookup has already happened and just does something with the result. This of course doesn't give you the complete end-to-end verification that you're looking for, but you could at least get a proof to the effect of "assuming that the hashmap lookups return the right values, the rest of the algorithm performs as expected."

Ah, I see! This is a good idea - thank you!😊 I've just looked into how this might apply to the diffing algorithm I want to verify and it turns out, an estimated ~75-80% of it can be verified this way.

You may also want to look at Verus, another tool for verifying the correctness of Rust code.

Thank you for the recommendation.👍 Wasn't Verus the other verification tool (besides Kani) that is used in the effort of verifying the Rust std lib?
Really looking forward to how the Rust team is going to verify std HashMap.👀

I've also briefly looked into Creusot, which uses deductive verification, but I haven't really tried it yet.

The formal verification rabbit hole is so deep!🐰🕳 I love it, but also...help!?😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[C] Feature / Enhancement A new feature request or enhancement to an existing feature. [E] Performance Track performance improvement (Time / Memory / CPU) [E] User Experience An UX enhancement for an existing feature. Including deprecation of an existing one. T-User Tag user issues / requests
Projects
None yet
Development

No branches or pull requests

2 participants