-
Notifications
You must be signed in to change notification settings - Fork 30
Description
I'm building an application that has a bunch of large (megabytes) probabilistic filters. However, I don't want to pay the bincode serialization / deserialization cost as it adds up quite a bit (currently a massive bottleneck in my application). I'd like to be able to read the raw filter into memory & then create a filter that references it. Serde seems incompatible with this approach (but please correct me if that's not the case).
There's 2 parts of the filters I've observed:
- A fixed-size descriptor that's typically maybe 2-3 u64 words (e.g. segment length, segment count length etc).
- The fingerprints array.
For serialization I'm hoping this interface is not objectionable:
pub trait DmaSerializable {
/// The serialized length of the descriptor. Very small and safe to allocate on-stack if needed.
const DESCRIPTOR_LEN: usize;
/// Copies the small fixed-length descriptor part of the filter to an output buffer.
fn dma_copy_descriptor_to(&self, out: &mut [u8]);
/// Obtains the raw byte slice of the fingerprints to serialize to disk.
fn dma_fingerprints(&self) -> &[u8];
}
The application code should have the necessary information to dump the filter to disk without any serialization (just a simple memcpy at worst).
For deserialization things get tricky and there's a few ways to handle it. I think the "cleanest" way is a new Ref
type (e.g. BinaryFuse8Ref<'a>
) that can be constructed via from_dma_parts
which takes in a pointer to the descriptor & the fingerprints and internally manage their access. I'm not sure about the cost of trying to avoid copying out of the descriptor vs eagerly destructuring it. My initial hunch would be to keep it simple and eagerly destructure.
@ayazhafiz thoughts?