Description
Background
Query shape currently has the ability to append field names. In addition, we'd like the option to add field data type (often called fieldType
in code).
In this example, field1, field2, field3
is are field names and text, date, keyword
are data types.
bool
filter:
terms [field1:text]
must:
range [field2:date]
term [field3:keyword]
must_not:
exists [field4:keyword]
should:
match [field5:text]
sort:
asc [field6:number]
desc [field2:date]
aggregation:
terms [field3:keyword]
aggregation:
avg [field6:number]
cardinality [field3:keyword]
date_histogram [field2:date]
max [field6:number]
min [field6:number]
percentile_ranks [field6:number]
sum [field6:number]
Problem
It is impossible to determine data type from just the search source.
In this match query, field name is "title", however the exact data type is unknown. It could be text
or keyword
.
GET /my_index/_search
{
"query": {
"match_phrase": {
"title": "Second Document"
}
}
}
What solution would you like?
Three possible solutions:
- Record
dataType
in each *QueryBuilder, *AggregationBuilder, *SortBuilder class during core search execution. Then when records are processed in query-insights plugin we can simply callaggBuilder.getDataType()
. I have seen builder classes get the data type mapping from shardContext like:
final MappedFieldType fieldType = shardContext.fieldMapper(fieldName);
Cons: Not all builders look up data type, Need to edit *Builder classes in core
-
Fetch mappings from query-insights plugin. Then we can get data type from known field name.
Note: Need to find a way to fetch _mapping data from query-insights -
Ignore field data type in query shape
In many cases, data type is known given the query/agg/sort type. For example, data type for date histogram aggregations is always date
, boolean query is always boolean data type. In these cases, adding data type adds no value:
aggregation:
date_histogram [date]
vs.
aggregation:
date_histogram
On the other hand, Range queries support a variety of data types so we would lose information with this option.
a. Date
GET /my_index/_search
{
"query": {
"range": {
"publish_date": {
"gte": "2024-01-01",
"lte": "2024-12-31"
}
}
}
}
b. Keyword
GET /my_index/_search
{
"query": {
"range": {
"numeric_as_string": {
"gte": "10",
"lte": "100"
}
}
}
}
c. Numeric (which consists of int, long, float, double, short, byte)
GET /my_index/_search
{
"query": {
"range": {
"price": {
"gte": 10,
"lte": 100
}
}
}
}
Other Consideration
- When data type is String or Numeric, do we need to distinguish between specific types (eg. int, long, float, double, short, byte)?
Metadata
Metadata
Assignees
Type
Projects
Status
Status