Skip to content

Search APIs don't properly resolve Component Selector against remote schema #9837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jleibs opened this issue Apr 29, 2025 · 0 comments · Fixed by #9854
Closed

Search APIs don't properly resolve Component Selector against remote schema #9837

jleibs opened this issue Apr 29, 2025 · 0 comments · Fixed by #9854
Assignees
Labels
🦟 regression A thing that used to work in an earlier release
Milestone

Comments

@jleibs
Copy link
Member

jleibs commented Apr 29, 2025

The Vector/FTS Search APIs currently have inconsistent hacks that only sometimes do the right thing.

See, for example:

// TODO(jleibs): get rid of this hack
if component_descriptor.component_name == ComponentName::from("rerun.components.Text") {
component_descriptor = component_descriptor
.or_with_archetype_name(|| "rerun.archetypes.TextLog".into())
.or_with_archetype_field_name(|| "text".into());
}

The reasoning behind this is that the server needs to know the exact column to build the index over and search.

Now that we have an API to get the schema, we can do much better.

The logic should now be:

  • Get the schema for the dataset
  • Scan the schema to find a SINGLE match for the component selector
    • If there are no matches, or multiple matches this should raise an error about an ambiguous component selector.
    • Note: tagged components are going to create situations that will be inherently ambiguous (follow up with @Wumpf to make sure we are considering this in the tagged component workstream -- component Selectors will need to be expanded to be tagged-component aware.)

This logic needs to be replicated in all of:

  • create_fts_index
  • create_vector_index
  • fts_search
  • vector_search

This probably means writing some helper on dataset like resolve_component_selector. There is some similar logic for view contents now in dataframe_query.rs`.

@jleibs jleibs added the 🦟 regression A thing that used to work in an earlier release label Apr 29, 2025
@jleibs jleibs added this to the 0.23.2 milestone Apr 29, 2025
This was referenced Apr 29, 2025
@abey79 abey79 self-assigned this Apr 30, 2025
abey79 added a commit that referenced this issue Apr 30, 2025
### Related

* Part of #9837

### What

Move `{Column|ComponentColumn|TimelineColumn}Selector` to `re_sorbet`
where they belong (alongside the `*Descriptor` crowd).
@abey79 abey79 closed this as completed in 05f867c May 1, 2025
abey79 added a commit that referenced this issue May 1, 2025
### Related

* Part of #9837

### What

Move `{Column|ComponentColumn|TimelineColumn}Selector` to `re_sorbet`
where they belong (alongside the `*Descriptor` crowd).
abey79 added a commit that referenced this issue May 1, 2025
…arch APIs (#9854)

### Related

* Fixes #9837
* Further issue to address:
  * #9853
  * #9855 

### What

Initial attempt to formalise component column selector, how they are
matched against a schema, and how they are expressed in our Python API.
Applied on dataset index creation/search APIs.

TODO:
- [x] use `AnyComponentColumn` in APIs
- [x] cleanup and fix type stubs

---------

Co-authored-by: Jeremy Leibs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🦟 regression A thing that used to work in an earlier release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants