Skip to content

Migrate file format to protobuf #8995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 15, 2025
Merged

Migrate file format to protobuf #8995

merged 9 commits into from
Feb 15, 2025

Conversation

jprochazk
Copy link
Member

@jprochazk jprochazk commented Feb 11, 2025

Related

What

  • Use protobuf encoding for our file format.
  • Completely remove msgpack code from the repository.

Tested by opening running the clock example and attempting to load the data in both the native and web viewers.

There are still some pieces left in the file format that have now been made redundant. One example is the per-recording compression field. We don't use that anymore, compression is now specified per ArrowMsg. There are a few other things like that, but I believe that refactoring is better left to the time when we decide to get rid of LogMsg, which would be another breaking change for the file format.

Copy link

github-actions bot commented Feb 11, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
b869a78 https://rerun.io/viewer/pr/8995 +nightly +main

Note: This comment is updated whenever you push a commit.

@jprochazk jprochazk added the 🪵 Log & send APIs Affects the user-facing API for all languages label Feb 11, 2025
@jprochazk jprochazk marked this pull request as draft February 11, 2025 18:03
@jprochazk jprochazk force-pushed the jan/migrate-file-protobuf branch from b1ebf6e to cb6f4f8 Compare February 13, 2025 15:00
@jprochazk jprochazk force-pushed the jan/migrate-file-protobuf branch from cb6f4f8 to ea90dd8 Compare February 13, 2025 15:01
@jprochazk jprochazk marked this pull request as ready for review February 13, 2025 15:19
@teh-cmc teh-cmc self-requested a review February 14, 2025 09:04
Copy link
Member

@teh-cmc teh-cmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

We will need to fire a nightly ASAP once this is merged -- this will break all developers' welcome screen for a while. Let's try to do that at a not too inconvenient time (marking as DNM for safety).

Comment on lines 34 to 37
// NOTE: This is `size_of` on a `repr(Rust)` struct,
// which is fine because we control the layout
// in the definition above, and tests would quickly
// catch any sort of regression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not switching to the C ABI if we need control over layout though?

Copy link
Member Author

@jprochazk jprochazk Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit more, and maybe this should just be a literal constant. We need control over the encoding, not the layout of the struct.
Alternatively we can control the layout of the struct, and use something like bytemuck to cast to/from a slice of bytes. But that's an extra dependency that may not be worth it for something so simple

Comment on lines +40 to +41
// NOTE: We use little-endian encoding, because we live in
// the 21st century.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this just after reviewing an arrow PR that encodes stuff as BE to be spec compliant lol

Copy link
Member Author

@jprochazk jprochazk Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is our format, and it doesn't go over the network so nobody can claim it should be network-endian! 😄

...well it technically does if you download it over http. But we must resist the CPU cycles wasted on swapping bytes.

@teh-cmc teh-cmc added the do-not-merge Do not merge this PR label Feb 14, 2025
@jprochazk jprochazk removed the do-not-merge Do not merge this PR label Feb 15, 2025
@jprochazk jprochazk merged commit 46683ce into main Feb 15, 2025
35 checks passed
@jprochazk jprochazk deleted the jan/migrate-file-protobuf branch February 15, 2025 16:39
@emilk emilk mentioned this pull request Apr 11, 2025
16 tasks
@emilk emilk restored the jan/migrate-file-protobuf branch April 28, 2025 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include in changelog 🪵 Log & send APIs Affects the user-facing API for all languages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove support for MsgPack serializer Migrate rrd to protobuf
2 participants