Skip to content

Tagged columnar updates: Rust #8764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 21, 2025
Merged

Tagged columnar updates: Rust #8764

merged 12 commits into from
Jan 21, 2025

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Jan 21, 2025

I ended up with something slightly different than the original design, because it turned out to really not feel that great in practice.
In particular, it required duplicating a lot of generated code for little (no?) added value.

In short: .columns becomes a vanilla method rather than a static method.

// Prepare a point cloud that evolves over 5 timesteps, changing the number of points in the process.
let times = TimeColumn::new_seconds("time", 10..15);

#[rustfmt::skip]
let positions = [
    [1.0, 0.0, 1.0], [0.5, 0.5, 2.0],
    [1.5, -0.5, 1.5], [1.0, 1.0, 2.5], [-0.5, 1.5, 1.0], [-1.5, 0.0, 2.0],
    [2.0, 0.0, 2.0], [1.5, -1.5, 3.0], [0.0, -2.0, 2.5], [1.0, -1.0, 3.5],
    [-2.0, 0.0, 2.0], [-1.5, 1.5, 3.0], [-1.0, 1.0, 3.5],
    [1.0, -1.0, 1.0], [2.0, -2.0, 2.0], [3.0, -1.0, 3.0], [2.0, 0.0, 4.0],
];

// At each timestep, all points in the cloud share the same but changing color and radius.
let colors = [0xFF0000FF, 0x00FF00FF, 0x0000FFFF, 0xFFFF00FF, 0x00FFFFFF];
let radii = [0.05, 0.01, 0.2, 0.1, 0.3];

// Partition our data as expected across the 5 timesteps.
let position = rerun::Points3D::update_fields()
    .with_positions(positions)
    .columns([2, 4, 4, 3, 4])?;
let color_and_radius = rerun::Points3D::update_fields()
    .with_colors(colors)
    .with_radii(radii)
    .columns([1, 1, 1, 1, 1])?;

rec.send_columns_v2("points", [times], position.chain(color_and_radius))?;

/// Note that this API ignores any stateful time set on the log stream via the
/// [`Self::set_timepoint`]/[`Self::set_time_nanos`]/etc. APIs.
/// Furthermore, this will _not_ inject the default timelines `log_tick` and `log_time` timeline columns.
pub fn send_columns_v2(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename this to send_columns and remove all the legacy stuff in a follow-up, I don't want the diffs to be unnecessarily hard to follow.

@teh-cmc teh-cmc force-pushed the cmc/tagged_column_updates_rust branch from 92b66c3 to 434d32c Compare January 21, 2025 15:13
Copy link

github-actions bot commented Jan 21, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
1db2c6c https://rerun.io/viewer/pr/8764 +nightly +main

Note: This comment is updated whenever you push a commit.

@teh-cmc
Copy link
Member Author

teh-cmc commented Jan 21, 2025

Ha, interestingly this won't work with Scalar as-is because we need to generate an extra with_scalars (plural) method.

I'll auto-generate plural methods for these instances in a follow-up.

@Wumpf Wumpf self-requested a review January 21, 2025 15:42
@teh-cmc
Copy link
Member Author

teh-cmc commented Jan 21, 2025

Related: I also intend to introduce a unit_columns() method in a follow-up, for the very common case (i.e. .columns(std::take::repeat(1).iter(n))).

/// This generates a correctly sized [`SerializedComponentColumn`] for a given indicator, where the
/// specified `num_rows` value represents the number of rows in the column.
#[doc(hidden)] // public so we can access it from re_types too
pub fn indicator_column<A: Archetype>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be greatly simplified, and much cheaper as well -> indicators are straight null arrays these days, no lists involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does mean making SerializedComponentColumn more complicated just for the sake of indicators, i.e. legacy...

Annoying, but there isn't really a choice here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just so I get this right, it would be more complicated because you have to keep the round count separate from the arrow data? (whereas right now you can just check the list array length)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah we'd have the same problem in C++ as well (in Python all the indicator handling happens on the Rust side anyways iirc)

Is it really worth it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(re Cpp: ComponentColumn there has a from_indicators which does pretty much what we're doing here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more complicated because SerializedComponentColumn would have to carry an erased Array instead, and have two paths for everything. I don't like it.

I'm gonna investigate an alternative: have send_columns_v2 automagically craft indicator columns as required on the fly. I don't like it either but objectively it is far less invasive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh god, this actually is an already existing problem, indicators are wrapped in a ListArray when put into chunks (probably to avoid adding a ton of complexity for a dead feature) 🤦.

That means:

  • Indicators take 4 bytes per row as of today (the outer list-array's offsets).
  • What I did is actually the right thing to do, and it makes me very sad.

Man, indicators really gotta go...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, there might be a win to have here: automagically craft the indicators in send_columns_v2, but then log them in their own independent chunk.

And while we're doing that, we should check if the same can be done with the absolute minimal efforts in the batcher, which would fix the existing problem for vanilla log calls.

Copy link
Member Author

@teh-cmc teh-cmc Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goes without saying, this will have to happen in yet another follow up.

Copy link
Member Author

@teh-cmc teh-cmc Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until (if) we get to that, I will need to introduce a hardcoded exception to not automatically generate indicators for scalars, otherwise that's going to be a pretty big regression compared to today (we guide users towards not sending an indicator in the exist column-oriented scalar examples).

Copy link
Member

@Wumpf Wumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very happy you found an ergonomic way without generating tons of with_ methods again. I like the new-new design you landed at :) 👍

@teh-cmc teh-cmc merged commit e6e3b67 into main Jan 21, 2025
37 of 38 checks passed
@teh-cmc teh-cmc deleted the cmc/tagged_column_updates_rust branch January 21, 2025 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tagged columnar updates: Rust Support columns with variable partitions in Rust's send_column api
2 participants