Description
Describe the bug
We allow users to configure setting indices.breaker.fielddata.limit
lesser than indices.fielddata.cache.size
. If this happens and if fielddata cache is enabled on one or more fields then it is possible for fielddata cache to grow beyond fielddata breaker limit. This can happen if there is a sudden burst of heavy search queries which can fill up the cache with more field data than CB limit before circuit breaker starts kicking in. Due to this, subsequent search queries or aggregations on fielddata cache enabled fields will start failing with circuit breaker exceptions.
[2024-02-25T09:00:44,638][DEBUG][o.o.a.s.TransportSearchAction] [f92acfa0c58f3643980f1cada9df945d] [l3F63B3JR0KY7qbJ5cyJAg][.opendistro-ism-config][1]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[.opendistro-ism-config], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], routing='null', preference='_shards:1|_primary', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":100,"query":{"match_all":{"boost":1.0}},"version":true,"seq_no_primary_term":true,"sort":[{"_id":{"order":"asc","missing":"_last","unmapped_type":"keyword"}}],"search_after":[""]}, cancelAfterTimeInterval=null, pipeline=null}] lastShard [true][2024-02-25T09:00:44,638][DEBUG][o.o.a.s.TransportSearchAction] [f92acfa0c58f3643980f1cada9df945d] #[org.opensearch.OpenSearchException,java.util.concurrent.ExecutionException,org.opensearch.core.common.breaker.CircuitBreakingException]#All shards failed for phase: [query]
OpenSearchException[java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3469348057/3.2gb], which is larger than the limit of [515396075/491.5mb]]]; nested: ExecutionException[CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3469348057/3.2gb], which is larger than the limit of [515396075/491.5mb]]]; nested: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3469348057/3.2gb], which is larger than the limit of [515396075/491.5mb]];
at org.opensearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.load(AbstractIndexOrdinalsFieldData.java:116)
at org.opensearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.load(AbstractIndexOrdinalsFieldData.java:62)
at org.opensearch.index.mapper.IdFieldMapper$IdFieldType$1$1.load(IdFieldMapper.java:209)
at org.opensearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource.getValues(BytesRefFieldComparatorSource.java:91)
at org.opensearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource$2.getBinaryDocValues(BytesRefFieldComparatorSource.java:141)
at org.apache.lucene.search.FieldComparator$TermValComparator.getLeafComparator(FieldComparator.java:280)
at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:176)
at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.<init>(TopFieldCollector.java:64)
at org.apache.lucene.search.TopFieldCollector$PagingFieldCollector$1.<init>(TopFieldCollector.java:254)
at org.apache.lucene.search.TopFieldCollector$PagingFieldCollector.getLeafCollector(TopFieldCollector.java:254)
at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:306)
at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551)
at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:360)
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:447)
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:431)
at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:65)
at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:66)
at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)
at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155)
at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533)
at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597)
at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566)
at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:917)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Related component
Search:Resiliency
To Reproduce
One way to potentially reproduce this is
- Create an index with field data cache enabled on some of the text value fields.
- Ingest data till field data cache reaches (say 20%). Use
GET /_cat/fielddata
for monitoring. - Set breaker limit to 1%.
- Execute heavy search queries (resulting in >1% of data size) on fields with field data cache enabled.
Expected behavior
- A validation should be added in OpenSearch to reject update setting request if
indices.breaker.fielddata.limit
is less thanindices.fielddata.cache.size
. - Default value of
indices.breaker.fielddata.limit
is 40% of JVM and default cache size is unbounded. We should also consider setting the default cache size to be less than default breaker limit (say 38%).
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
- OS: [e.g. iOS]
- Version [e.g. 22]
Additional context
Add any other context about the problem here.
Metadata
Metadata
Assignees
Type
Projects
Status