Skip to content

RFC: 0-copy serialization / deserialization #79

@vlovich

Description

@vlovich

I'm building an application that has a bunch of large (megabytes) probabilistic filters. However, I don't want to pay the bincode serialization / deserialization cost as it adds up quite a bit (currently a massive bottleneck in my application). I'd like to be able to read the raw filter into memory & then create a filter that references it. Serde seems incompatible with this approach (but please correct me if that's not the case).

There's 2 parts of the filters I've observed:

  1. A fixed-size descriptor that's typically maybe 2-3 u64 words (e.g. segment length, segment count length etc).
  2. The fingerprints array.

For serialization I'm hoping this interface is not objectionable:

pub trait DmaSerializable {
    /// The serialized length of the descriptor. Very small and safe to allocate on-stack if needed.
    const DESCRIPTOR_LEN: usize;

    /// Copies the small fixed-length descriptor part of the filter to an output buffer.
    fn dma_copy_descriptor_to(&self, out: &mut [u8]);

    /// Obtains the raw byte slice of the fingerprints to serialize to disk.
    fn dma_fingerprints(&self) -> &[u8];
}

The application code should have the necessary information to dump the filter to disk without any serialization (just a simple memcpy at worst).

For deserialization things get tricky and there's a few ways to handle it. I think the "cleanest" way is a new Ref type (e.g. BinaryFuse8Ref<'a>) that can be constructed via from_dma_parts which takes in a pointer to the descriptor & the fingerprints and internally manage their access. I'm not sure about the cost of trying to avoid copying out of the descriptor vs eagerly destructuring it. My initial hunch would be to keep it simple and eagerly destructure.

@ayazhafiz thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions