Skip to content

Contains isn't O(1) with respect to the size of the filter? #84

@vlovich

Description

@vlovich

I'm seeing something surprising - the performance of querying the filter seems to depend somewhat on how many elements are contained within the filter. Is that expected? I would have thought lookups are O(1).

Contains performance with 1M elements in the filter:

         ╰─ bfuse    3.48 ms       │ 4.601 ms      │ 3.571 ms      │ 3.577 ms      │ 100     │ 100
                     287.2 Mitem/s │ 217.3 Mitem/s │ 280 Mitem/s   │ 279.5 Mitem/s │         │

Contains performance with 9M elements in the filter:

         ╰─ bfuse    12.22 ms      │ 16.63 ms      │ 13.38 ms      │ 13.48 ms      │ 100     │ 100
                     81.76 Mitem/s │ 60.1 Mitem/s  │ 74.73 Mitem/s │ 74.17 Mitem/s │         │

Contains performance with 99M elements in the filter:

         ╰─ bfuse    18.22 ms      │ 24.79 ms      │ 19.63 ms      │ 19.97 ms      │ 100     │ 100
                     54.86 Mitem/s │ 40.32 Mitem/s │ 50.92 Mitem/s │ 50.07 Mitem/s │         │

It does seem to level off from there, but that's still a 4x difference reduction in speed for a 100x increase in filter size. BinaryFuse8 vs BinaryFuse32 doesn't seem to matter. 1M elements is 4 MiB, 9M filter is 40 MiB, while 100M elements is a 2 GiB filter so maybe this is just an artifact of the microbenchmark sitting in the cache? Just wanted to double-check my assumptions.

Benchmark code:

    #[divan::bench]
    pub fn bfuse(bencher: divan::Bencher) {
        #[derive(Clone, Copy)]
        struct Counter(u64, u64);
        impl Iterator for Counter {
            type Item = u64;

            fn next(&mut self) -> Option<Self::Item> {
                let next = self.0;
                if next >= self.1 {
                    None
                } else {
                    self.0 += 1;
                    Some(next)
                }
            }
        }

        impl ExactSizeIterator for Counter {
            fn len(&self) -> usize {
                (self.1 - self.0) as usize
            }
        }

        let containment_range = 1_000_000..500_000_000;
        const fn lookup_range() -> std::ops::Range<u64> {
            500_000..1_500_000
        }

        let filter =
            BinaryFuse32::try_from_iterator(Counter(containment_range.start, containment_range.end))
                .unwrap();
        bencher
            .counter(divan::counter::ItemsCount::new(
                lookup_range().end - lookup_range().start,
            ))
            .bench_local(|| {
                for i in lookup_range() {
                    std::hint::black_box(filter.contains(&i));
                }
            });
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions