-
Notifications
You must be signed in to change notification settings - Fork 453
Migrate file format to protobuf #8995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
c0dec8f
424e1b4
176de23
9d14aef
55a1101
cf92ecd
ea90dd8
aa94a81
b869a78
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,6 +30,15 @@ pub(crate) struct MessageHeader { | |
} | ||
|
||
impl MessageHeader { | ||
/// Size of an encoded message header, in bytes. | ||
// NOTE: This is `size_of` on a `repr(Rust)` struct, | ||
// which is fine because we control the layout | ||
// in the definition above, and tests would quickly | ||
// catch any sort of regression. | ||
pub const SIZE_BYTES: usize = std::mem::size_of::<Self>(); | ||
|
||
// NOTE: We use little-endian encoding, because we live in | ||
// the 21st century. | ||
Comment on lines
+36
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reading this just after reviewing an arrow PR that encodes stuff as BE to be spec compliant lol There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This part is our format, and it doesn't go over the network so nobody can claim it should be network-endian! 😄 ...well it technically does if you download it over http. But we must resist the CPU cycles wasted on swapping bytes. |
||
#[cfg(feature = "encoder")] | ||
pub(crate) fn encode( | ||
&self, | ||
|
@@ -45,7 +54,7 @@ impl MessageHeader { | |
pub(crate) fn decode( | ||
data: &mut impl std::io::Read, | ||
) -> Result<Self, crate::decoder::DecodeError> { | ||
let mut buf = [0; std::mem::size_of::<Self>()]; | ||
let mut buf = [0; Self::SIZE_BYTES]; | ||
data.read_exact(&mut buf)?; | ||
|
||
Self::from_bytes(&buf) | ||
|
@@ -55,7 +64,7 @@ impl MessageHeader { | |
/// TODO(zehiko) this should be public, we need to shuffle things around to ensure that #8726 | ||
#[cfg(feature = "decoder")] | ||
pub fn from_bytes(buf: &[u8]) -> Result<Self, crate::decoder::DecodeError> { | ||
if buf.len() != 16 { | ||
if buf.len() != Self::SIZE_BYTES { | ||
return Err(crate::decoder::DecodeError::Codec( | ||
crate::codec::CodecError::HeaderDecoding(std::io::Error::new( | ||
std::io::ErrorKind::InvalidData, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not switching to the C ABI if we need control over layout though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a bit more, and maybe this should just be a literal constant. We need control over the encoding, not the layout of the struct.
Alternatively we can control the layout of the struct, and use something like
bytemuck
to cast to/from a slice of bytes. But that's an extra dependency that may not be worth it for something so simple