Using function contracts to verify insertion into HashMap runs into OOM (>8GB) #3965

janriemer · 2025-03-29T13:49:59Z

Hi all,

Goal that I try to achieve

I'm trying to verify the diffing algorithm in csv-diff with kani.
In the algorithm itself, a lot of operations on hash maps are happening, mainly insertions and/or updates.

Problem I run into

When trying to impl a kani proof for it (first without function contracts), memory usage is fine (~6GB), but after letting it run for over 1,5h, I killed it, because the algorithm seems to be too complicated to verify it fully automatically.

So I resort to kani's function contracts in the hope of composing smaller proofs together (via proof_for_contract and stub_verified).

In order to "test the waters", I've tried to proof a very simple "dummy" function that adds/updates a single hash map entry with a function contract, but unfortunately when executing the proof_for_contract, even this tiny example blew up my memory:

=> I only have 8GB RAM available, but those 8GB were filled very quickly during the execution.

So my question is: is it normal that this much RAM is required for function contracts? Trying to verify the same "dummy" function with normal proof harness finishes very fast and doesn't require lots of RAM.
I guess it might have something to do with the modifies clause not being properly constraint (in case of proof_for_contract), but also I don't know how to constrain it any further (see code below).

Any help is greatly appreciated!❤️

Code to reproduce

I tried this code (function contract on the "dummy" function that does some hash map operation):

struct Foo(HashMap<u64, u8>);

impl Foo {
    pub fn new() -> Self {
        Self(Default::default())
    }

    // not sure how to restrict this further => hash map internals are complicated!
    #[cfg_attr(kani, kani::modifies(&mut self.0))]
    // try to restrict input to manageable size
    #[cfg_attr(kani, kani::requires(num <= 10 && bar <= 100))]
    #[cfg_attr(kani, kani::ensures(|result| self.0.get(&num) == Some(&bar)))]
    pub fn add(&mut self, num: u64, bar: u8) {
        *self.0.entry(num).or_insert(bar) = bar;
    }
}

#[cfg(kani)]
mod verification {
    use std::{
        collections::HashSet,
        hash::{BuildHasher, DefaultHasher, RandomState},
        mem::transmute,
    };

    #[allow(dead_code)]
    fn concrete_state() -> RandomState {
        let keys: [u64; 2] = Default::default();
        assert_eq!(size_of_val(&keys), size_of::<RandomState>());
        unsafe { transmute(keys) }
    }

    #[kani::proof_for_contract(Foo::add)]
    #[kani::stub(RandomState::new, concrete_state)]
    #[kani::unwind(4)]
    pub fn verify_contract_foo_add() {
        let mut foo = Foo::new();
        foo.add(kani::any(), kani::any());
    }
}

using the following command line invocation:

cargo kani --harness verify_contract_foo_add -Z function-contracts

with Kani version: 0.58

The text was updated successfully, but these errors were encountered:

carolynzech · 2025-04-01T22:17:51Z

Hi! We're glad you're using Kani. I just started looking into this--I'll circle back with some suggestions once I've had a chance to dig in further.

janriemer · 2025-04-03T20:50:34Z

Thank you @carolynzech for looking into this!🙏

In the meantime I've set up a (for now private) GitHub Codespace where I have 16GB RAM available.

When running the function contract in the OP with the GitHub Codespace, it failed with a CBMC error code 15 when system reached ~15GB (not quite sure how much RAM exactly was used until the error happened, cause I used top command to track mem usage and refresh interval of top is 1s I think).

I couldn't find anything on the internet regarding CBMC erroring with code 15.
I assume, it just means OOM exception!?

Note: I haven't gotten this error, when running on my local machine (with 8GB). Maybe it is because my local machine has a swap partition and GitHub Codespace has no swap? However, regarding no swap of GitHub Codespace, this is just guessing from my side.

Please let me know, if I can assist any further or should provide you with more information.

Really appreciate your effort!❤

carolynzech · 2025-04-03T21:33:46Z

Thanks for the info. CBMC provides its exit codes in this file: https://github.com/diffblue/cbmc/blob/fab67109ec0c3132b41a95bfe9c503f71fe9b329/src/util/exit_codes.h#L10

and out of memory is exit code 6, so that is indeed confusing. It's likely an error code returned by the system that killed CBMC, rather than an error code from CBMC itself.

I tried to improve performance on your example, but it is generally very hard to get HashMaps and contracts to play nicely together. There's work we can (and should) do to make memory/runtime improvements there, but those are unfortunately nontrivial and not something we'll have done anytime super soon.

I have two suggestions for things you can try:

Break up the algorithm into smaller functions, and then using regular #[kani::proof]s with assumes/asserts on those. So for example:

#[kani::requires(x < 10)]
#[kani::ensures(|result| *result == 100)]
fn foo(x: u8) -> u8 { ... }

#[kani::proof_for_contract(foo)]
fn foo_harness() {
  let x: u8 = kani::any();
  foo(x);
}

is equivalent to:

fn foo(x: u8) -> u8 { ... }

#[kani::proof]
fn foo_harness() {
  let x: u8 = kani::any();
  kani::assume(x < 10);
  let result = foo(x);
  kani::assert(result == 100);
}

If foo_harness passes, you can assume the postcondition in a later proof without causing unsoundness:

fn bar_harness() {
   let x: u8 = kani::any();
   let foo_result = foo(x);
   kani::assume(foo_result == 100);
   bar(foo_result);
}

If you can try to think of another way to represent the algorithm that doesn't require HashMaps (like a simple vector), that may help, since the hashing logic is difficult for the symbolic executor to reason about.

Generally speaking, symbolic execution with HashMaps is a difficult problem, so while we definitely want to improve Kani's performance here, those improvements may not come quickly enough for you to make use of them very soon. Please let us know if there's anything else we can help with, or if there's a breakdown of the proof approach that you'd like us to look at.

carolynzech · 2025-04-03T21:37:01Z

@janriemer You could also try using Bolero (https://github.com/camshaft/bolero).

janriemer · 2025-04-06T11:58:20Z

Wow, thank you @carolynzech for looking into it so deeply!❤

Thanks for the info. CBMC provides its exit codes in this file: https://github.com/diffblue/cbmc/blob/fab67109ec0c3132b41a95bfe9c503f71fe9b329/src/util/exit_codes.h#L10

and out of memory is exit code 6, so that is indeed confusing. It's likely an error code returned by the system that killed CBMC, rather than an error code from CBMC itself.

Hm...interesting. Thank you for pointing me to the error codes. Maybe it is because of the docker container in my Codespace not having all the necessary dependencies CBMC requires!? Anyway, this seems to be a separate issue then and we don't need to look into it for now.

I tried to improve performance on your example, but it is generally very hard to get HashMaps and contracts to play nicely together. There's work we can (and should) do to make memory/runtime improvements there, but those are unfortunately nontrivial and not something we'll have done anytime super soon.

I have two suggestions for things you can try:

Break up the algorithm into smaller functions, and then using regular #[kani::proof]s with assumes/asserts on those. So for example:

[...]

If foo_harness passes, you can assume the postcondition in a later proof without causing unsoundness:

[...]

Ohhh, I didn't know you could do that! This is brilliant! I'll definitely try this out. Thank you!😊

If you can try to think of another way to represent the algorithm that doesn't require HashMaps (like a simple vector), that may help, since the hashing logic is difficult for the symbolic executor to reason about.

Yeah, this is an interesting idea.👍 Unfortunately, a primary goal of the algorithm is performance and I'd like to prevent regressions in that regard.
An alternative to HashMap or Vec might be BTreeMaps!? Not sure what the perf impact on the algorithm itself will be, but Kani might be able to handle BTreeMaps better than HashMaps!?

@janriemer You could also try using Bolero (https://github.com/camshaft/bolero).

Ah, yes, I've already read about it in the kani docs and on Kani's blog. If my approach with Kani should fail, I'll definitely look into fuzzing and/or property based testing.

Generally speaking, symbolic execution with HashMaps is a difficult problem, so while we definitely want to improve Kani's performance here, those improvements may not come quickly enough for you to make use of them very soon. Please let us know if there's anything else we can help with, or if there's a breakdown of the proof approach that you'd like us to look at.

You are so kind! Thank you so much for your elaborate answer and for providing such great support!❤

Your answer has really helped me to get "unblocked", because:

I now know that verfiying algorithms with symbolic execution involving HashMaps is generally difficult, so I "wasn't doing anything wrong"
I have some more alternative approaches I can try now

For me Kani is a technology that is as close to magic as one can possibly imagine!✨ Thank you for maintaining such an awesome tool!

carolynzech · 2025-04-06T15:00:22Z

@janriemer

Yeah, this is an interesting idea.👍 Unfortunately, a primary goal of the algorithm is performance and I'd like to prevent regressions in that regard.

Fair point, although you can separate the verification of the whole algorithm from the verification of the part that actually performs the lookup. Put differently, going back to my suggestion to break up the verification into multiple methods, you can imagine a division of labor like this:

fn lookup_value(map: HashMap<...>) { ... } -> ...
fn do_something_with_value(val: ...) { ... }

where do_something_with_value assumes the HashMap lookup has already happened and just does something with the result. This of course doesn't give you the complete end-to-end verification that you're looking for, but you could at least get a proof to the effect of "assuming that the hashmap lookups return the right values, the rest of the algorithm performs as expected."

You may also want to look at Verus, another tool for verifying the correctness of Rust code.

janriemer · 2025-04-06T18:49:03Z

Fair point, although you can separate the verification of the whole algorithm from the verification of the part that actually performs the lookup. Put differently, going back to my suggestion to break up the verification into multiple methods, you can imagine a division of labor like this:

fn lookup_value(map: HashMap<...>) { ... } -> ...
fn do_something_with_value(val: ...) { ... }

where do_something_with_value assumes the HashMap lookup has already happened and just does something with the result. This of course doesn't give you the complete end-to-end verification that you're looking for, but you could at least get a proof to the effect of "assuming that the hashmap lookups return the right values, the rest of the algorithm performs as expected."

Ah, I see! This is a good idea - thank you!😊 I've just looked into how this might apply to the diffing algorithm I want to verify and it turns out, an estimated ~75-80% of it can be verified this way.

You may also want to look at Verus, another tool for verifying the correctness of Rust code.

Thank you for the recommendation.👍 Wasn't Verus the other verification tool (besides Kani) that is used in the effort of verifying the Rust std lib?
Really looking forward to how the Rust team is going to verify std HashMap.👀

I've also briefly looked into Creusot, which uses deductive verification, but I haven't really tried it yet.

The formal verification rabbit hole is so deep!🐰🕳 I love it, but also...help!?😂

janriemer added the [C] Bug This is a bug. Something isn't working. label Mar 29, 2025

carolynzech self-assigned this Apr 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using function contracts to verify insertion into HashMap runs into OOM (>8GB) #3965

Using function contracts to verify insertion into HashMap runs into OOM (>8GB) #3965

janriemer commented Mar 29, 2025

carolynzech commented Apr 1, 2025

janriemer commented Apr 3, 2025 •

edited

Loading

carolynzech commented Apr 3, 2025

carolynzech commented Apr 3, 2025

janriemer commented Apr 6, 2025 •

edited

Loading

carolynzech commented Apr 6, 2025

janriemer commented Apr 6, 2025 •

edited

Loading

Using function contracts to verify insertion into HashMap runs into OOM (>8GB) #3965

Using function contracts to verify insertion into HashMap runs into OOM (>8GB) #3965

Comments

janriemer commented Mar 29, 2025

Goal that I try to achieve

Problem I run into

Code to reproduce

carolynzech commented Apr 1, 2025

janriemer commented Apr 3, 2025 • edited Loading

carolynzech commented Apr 3, 2025

carolynzech commented Apr 3, 2025

janriemer commented Apr 6, 2025 • edited Loading

carolynzech commented Apr 6, 2025

janriemer commented Apr 6, 2025 • edited Loading

janriemer commented Apr 3, 2025 •

edited

Loading

janriemer commented Apr 6, 2025 •

edited

Loading

janriemer commented Apr 6, 2025 •

edited

Loading