Search APIs don't properly resolve Component Selector against remote schema #9837

jleibs · 2025-04-29T13:23:21Z

The Vector/FTS Search APIs currently have inconsistent hacks that only sometimes do the right thing.

See, for example:

Lines 227 to 232 in bc0e318

    
           // TODO(jleibs): get rid of this hack 
        
           if component_descriptor.component_name == ComponentName::from("rerun.components.Text") { 
        
               component_descriptor = component_descriptor 
        
                   .or_with_archetype_name(|| "rerun.archetypes.TextLog".into()) 
        
                   .or_with_archetype_field_name(|| "text".into()); 
        
           }

The reasoning behind this is that the server needs to know the exact column to build the index over and search.

Now that we have an API to get the schema, we can do much better.

The logic should now be:

Get the schema for the dataset
Scan the schema to find a SINGLE match for the component selector
- If there are no matches, or multiple matches this should raise an error about an ambiguous component selector.
- Note: tagged components are going to create situations that will be inherently ambiguous (follow up with @Wumpf to make sure we are considering this in the tagged component workstream -- component Selectors will need to be expanded to be tagged-component aware.)

This logic needs to be replicated in all of:

create_fts_index
create_vector_index
fts_search
vector_search

This probably means writing some helper on dataset like resolve_component_selector. There is some similar logic for view contents now in dataframe_query.rs`.

The text was updated successfully, but these errors were encountered:

### Related * Part of #9837 ### What Move `{Column|ComponentColumn|TimelineColumn}Selector` to `re_sorbet` where they belong (alongside the `*Descriptor` crowd).

…arch APIs (#9854) ### Related * Fixes #9837 * Further issue to address: * #9853 * #9855 ### What Initial attempt to formalise component column selector, how they are matched against a schema, and how they are expressed in our Python API. Applied on dataset index creation/search APIs. TODO: - [x] use `AnyComponentColumn` in APIs - [x] cleanup and fix type stubs --------- Co-authored-by: Jeremy Leibs <[email protected]>

jleibs added the 🦟 regression A thing that used to work in an earlier release label Apr 29, 2025

jleibs added this to the 0.23.2 milestone Apr 29, 2025

This was referenced Apr 29, 2025

Fix FTS bug #9835

Closed

feat: add datafusion scalar UDF examples #9841

Draft

abey79 self-assigned this Apr 30, 2025

abey79 mentioned this issue Apr 30, 2025

Move *Selector from re_chunk_store to re_sorbet #9851

Merged

abey79 added a commit that referenced this issue Apr 30, 2025

Move *Selector from re_chunk_store to re_sorbet (#9851)

b383dba

### Related * Part of #9837 ### What Move `{Column|ComponentColumn|TimelineColumn}Selector` to `re_sorbet` where they belong (alongside the `*Descriptor` crowd).

abey79 mentioned this issue Apr 30, 2025

Properly resolve component selectors in dataset index creation and search APIs #9854

Merged

2 tasks

abey79 closed this as completed in #9854 May 1, 2025

abey79 closed this as completed in 05f867c May 1, 2025

abey79 added a commit that referenced this issue May 1, 2025

Move *Selector from re_chunk_store to re_sorbet (#9851)

ff91a21

### Related * Part of #9837 ### What Move `{Column|ComponentColumn|TimelineColumn}Selector` to `re_sorbet` where they belong (alongside the `*Descriptor` crowd).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search APIs don't properly resolve Component Selector against remote schema #9837

Search APIs don't properly resolve Component Selector against remote schema #9837

jleibs commented Apr 29, 2025

Search APIs don't properly resolve Component Selector against remote schema #9837

Search APIs don't properly resolve Component Selector against remote schema #9837

Comments

jleibs commented Apr 29, 2025