Skip to content

[FEATURE] [RFC] Query Shape Field Data Type #69

Closed
@dzane17

Description

@dzane17

Background

Query shape currently has the ability to append field names. In addition, we'd like the option to add field data type (often called fieldType in code).

In this example, field1, field2, field3 is are field names and text, date, keyword are data types.

bool
  filter:
    terms [field1:text]
  must:
    range [field2:date]
    term [field3:keyword]
  must_not:
    exists [field4:keyword]
  should:
    match [field5:text]
sort:
  asc [field6:number]
  desc [field2:date]
aggregation:
  terms [field3:keyword]
    aggregation:
      avg [field6:number]
      cardinality [field3:keyword]
      date_histogram [field2:date]
      max [field6:number]
      min [field6:number]
      percentile_ranks [field6:number]
      sum [field6:number]

Problem

It is impossible to determine data type from just the search source.

In this match query, field name is "title", however the exact data type is unknown. It could be text or keyword.

GET /my_index/_search
{
  "query": {
    "match_phrase": {
      "title": "Second Document"
    }
  }
}

What solution would you like?

Three possible solutions:

  1. Record dataType in each *QueryBuilder, *AggregationBuilder, *SortBuilder class during core search execution. Then when records are processed in query-insights plugin we can simply call aggBuilder.getDataType(). I have seen builder classes get the data type mapping from shardContext like:
final MappedFieldType fieldType = shardContext.fieldMapper(fieldName);

Cons: Not all builders look up data type, Need to edit *Builder classes in core

  1. Fetch mappings from query-insights plugin. Then we can get data type from known field name.
    Note: Need to find a way to fetch _mapping data from query-insights

  2. Ignore field data type in query shape

In many cases, data type is known given the query/agg/sort type. For example, data type for date histogram aggregations is always date, boolean query is always boolean data type. In these cases, adding data type adds no value:

    aggregation:
      date_histogram [date]

vs. 


    aggregation:
      date_histogram

On the other hand, Range queries support a variety of data types so we would lose information with this option.
a. Date

GET /my_index/_search
{
  "query": {
    "range": {
      "publish_date": {
        "gte": "2024-01-01",
        "lte": "2024-12-31"
      }
    }
  }
}

b. Keyword

GET /my_index/_search
{
  "query": {
    "range": {
      "numeric_as_string": {
        "gte": "10",
        "lte": "100"
      }
    }
  }
}

c. Numeric (which consists of int, long, float, double, short, byte)

GET /my_index/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 10,
        "lte": 100
      }
    }
  }
}

Other Consideration

  1. When data type is String or Numeric, do we need to distinguish between specific types (eg. int, long, float, double, short, byte)?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestv2.18.0Issues targeting release v2.18.0

Type

No type

Projects

Status

Done

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions