Skip to content

Add .rrd files to git lfs to test backwards compatibility #9110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 13 tasks
emilk opened this issue Feb 22, 2025 · 1 comment
Open
1 of 13 tasks

Add .rrd files to git lfs to test backwards compatibility #9110

emilk opened this issue Feb 22, 2025 · 1 comment
Assignees
Labels

Comments

@emilk
Copy link
Member

emilk commented Feb 22, 2025

We want to have automated tests to make sure we have backwards compatible chunk loaders.

Ideally we should add .rrd files to git lfs for all previous versions of all components and archetypes.

What to test?

  • They load without error/warning
  • Passes rerun rrd verify
  • Produce the same rerun print output?
    • Maybe just the column headers (names, meta-data)?
    • That's a useful sanity check when we update the .rrd:s in the test set, that the new ones covers approximately the same things
  • Produce the same visuals? Difficult to get right

What .rrd files?

  • Snippets?
  • Examples?
  • One .rrd file containing all known components/archetypes/datatypes?

When and how do we add more .rrd files to the test set?

What if something does break?

We do not yet promise full backwards compatibility, but any CI failure regarding this should be a red flag, and we should only accept it (and upload a new .rrd file) if we are really ok with this breaking change.

What remains to do?

  • Decide on exactly what .rrd files to test
  • Automatically verify that all snippets and examples etc have .rrd:s that are checked in
  • Add .rrds under folders with version string
  • Compare all versions of old .rrd files (e.g. from 0.22, 0.23, 0.24…)
  • Ensure we add new versions of snippet and examples iff needed
  • Improve the script for updating said .rrd files to:
    • Not be a bash script (-> Python or Rust)
    • Add missing files (e.g. when adding new snippets) without changing the existing files
    • Update all existing files (when intentionally breaking backwards compatibility)
  • Ensure we migrate deprecated types on ingestion #9370

Potential improvements

  • Migrate the checked in snippet rrd files, then compare them to the output of the latest snippet
@emilk emilk added the 🔩 data model Sorbet label Feb 22, 2025
@emilk emilk changed the title Add data files to git lfs to test backwards compatibility Add .rrd files to git lfs to test backwards compatibility Feb 25, 2025
@emilk emilk self-assigned this Feb 25, 2025
emilk added a commit that referenced this issue Mar 3, 2025
## What
* Fix formatting of chunks with zero rows
* Include metadata in `rerun rrd print -vv` (same fix as above)
* Simplify formatting of chunks with no metadata

## Related
* Part of #9110
emilk added a commit that referenced this issue Mar 4, 2025
### Related
* Part of #9110

### What
Add `rerun rrd verify some.rrd` which verifies that the current rerun
version can load and understand the given .rrd file.

It goes through each component column in each record batch, find the
corresponding component, and then tries to deserialize the arrow data
within.
emilk added a commit that referenced this issue Mar 4, 2025
### Related
* Part of #9110

### What
Adds a bunch of `.rrd`:s to `git lfs`, that we should keep there as a
test of backwards compatibility.
If any new code breaks loading of these old .rrd files, the CI will
complain.

Running locally:
> pixi run check-backwards-compatibility

### What if the CI complains about my PR?
Then your code broke backwards compatibility. Can you make the change so
that it doesn't? If not, consult with me! We are not yet promising
backwards compatibility, but we should at least make a reasonable
effort.

### What does this PR cover?
All Rust snippets are tested, as are the main examples (as of today).
This should include most components, but does not cover _everything_. So
even if `pixi run check-backwards-compatibility` passes, it is possible
that we have broken backwards compatibility in some subtle way. But this
is at least a start.

### TODO
* [x] Add a single pixi command to verify the files

### Future work
* Add more files, for improved coverage
* Automatically detect if some snippets/examples are missing from the
test dataset

---------

Co-authored-by: Clement Rey <[email protected]>
emilk added a commit that referenced this issue Mar 25, 2025
### Related
* Part of #9110
* In preparation for #9338

### What
Ensures that after loading an .rrd, there are no deprecated components
or archetypes remaining.

All deprecated types should have been migrated to non-deprecated types.
@emilk emilk removed their assignment Apr 11, 2025
@emilk
Copy link
Member Author

emilk commented Apr 15, 2025

compare_snippet_output.py should compare its output with the .rrd files checked in to git lfs:

  • If missing, it is an error (a new snippet needs to check in an .rrd for its output)
  • If it differs, it is an error

Complications

If we change a snippet (so that it outputs something different), we need to either:

  • Update the file on git lfs
  • Add a new file to git lfs alongside the old one
  • Add an opt-out flag for comparing to the file on git lfs to snippets.toml

Since this should be a rather rare occurrence (our snippets tend to stay pretty stable), any one solution will do.

@emilk emilk self-assigned this Apr 15, 2025
emilk added a commit that referenced this issue Apr 17, 2025
### Related
* Closes #3235
* Part of #9110
* Unblocks #9751
* Unblocks #9588

### What
In our snippet comparisons, ignore small numeric differences (that often
occur because of differences in programming language, or
compiler/interpreter version)

---------

Co-authored-by: Clement Rey <[email protected]>
grtlr pushed a commit that referenced this issue Apr 17, 2025
### Related
* Closes #3235
* Part of #9110
* Unblocks #9751
* Unblocks #9588

### What
In our snippet comparisons, ignore small numeric differences (that often
occur because of differences in programming language, or
compiler/interpreter version)

---------

Co-authored-by: Clement Rey <[email protected]>
emilk added a commit that referenced this issue Apr 17, 2025
## Related
* Requires #9750
* Part of #9110

## What
This restructures the folder of .rrds that we use for
backwards-compatibility checks, so that it has a `snippets` folder,
matching the structure of our snippets. These are then read by
`compare_snippets_output.py` and are compared to the latest output of
the snippets. They should be equal to each other (since the old .rrds
should be migrated on ingestion).

### Consequences
If we add a new snippet, the CI will fail until the output .rrd of that
snippet is checked in to CI (using `compare_snippets_output.py
--write-missing-backward-assets`)

If we _change_ a snippet (so that it outputs something different), we
need to either:
* Update the file on `git lfs`
* Add a _new_ file to `git lfs` alongside the old one
* Add an opt-out flag for comparing to the file on `git lfs` to
`snippets.toml`

Since this should be a rather rare occurrence (our snippets tend to stay
pretty stable), any one solution will do, and I have not decided on one
yet. Maybe cross that bridge if and when we get to it?
grtlr pushed a commit that referenced this issue Apr 17, 2025
## Related
* Requires #9750
* Part of #9110

## What
This restructures the folder of .rrds that we use for
backwards-compatibility checks, so that it has a `snippets` folder,
matching the structure of our snippets. These are then read by
`compare_snippets_output.py` and are compared to the latest output of
the snippets. They should be equal to each other (since the old .rrds
should be migrated on ingestion).

### Consequences
If we add a new snippet, the CI will fail until the output .rrd of that
snippet is checked in to CI (using `compare_snippets_output.py
--write-missing-backward-assets`)

If we _change_ a snippet (so that it outputs something different), we
need to either:
* Update the file on `git lfs`
* Add a _new_ file to `git lfs` alongside the old one
* Add an opt-out flag for comparing to the file on `git lfs` to
`snippets.toml`

Since this should be a rather rare occurrence (our snippets tend to stay
pretty stable), any one solution will do, and I have not decided on one
yet. Maybe cross that bridge if and when we get to it?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant