Skip to content

[BUG] Circuit breaker exceptions due to misconfigured fielddata cache size and circuit breaker for fieldcache #12475

Open
@vikasvb90

Description

@vikasvb90

Describe the bug

We allow users to configure setting indices.breaker.fielddata.limit lesser than indices.fielddata.cache.size. If this happens and if fielddata cache is enabled on one or more fields then it is possible for fielddata cache to grow beyond fielddata breaker limit. This can happen if there is a sudden burst of heavy search queries which can fill up the cache with more field data than CB limit before circuit breaker starts kicking in. Due to this, subsequent search queries or aggregations on fielddata cache enabled fields will start failing with circuit breaker exceptions.

[2024-02-25T09:00:44,638][DEBUG][o.o.a.s.TransportSearchAction] [f92acfa0c58f3643980f1cada9df945d] [l3F63B3JR0KY7qbJ5cyJAg][.opendistro-ism-config][1]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[.opendistro-ism-config], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], routing='null', preference='_shards:1|_primary', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":100,"query":{"match_all":{"boost":1.0}},"version":true,"seq_no_primary_term":true,"sort":[{"_id":{"order":"asc","missing":"_last","unmapped_type":"keyword"}}],"search_after":[""]}, cancelAfterTimeInterval=null, pipeline=null}] lastShard [true][2024-02-25T09:00:44,638][DEBUG][o.o.a.s.TransportSearchAction] [f92acfa0c58f3643980f1cada9df945d] #[org.opensearch.OpenSearchException,java.util.concurrent.ExecutionException,org.opensearch.core.common.breaker.CircuitBreakingException]#All shards failed for phase: [query]
OpenSearchException[java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3469348057/3.2gb], which is larger than the limit of [515396075/491.5mb]]]; nested: ExecutionException[CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3469348057/3.2gb], which is larger than the limit of [515396075/491.5mb]]]; nested: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3469348057/3.2gb], which is larger than the limit of [515396075/491.5mb]];
        at org.opensearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.load(AbstractIndexOrdinalsFieldData.java:116)
        at org.opensearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.load(AbstractIndexOrdinalsFieldData.java:62)
        at org.opensearch.index.mapper.IdFieldMapper$IdFieldType$1$1.load(IdFieldMapper.java:209)
        at org.opensearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource.getValues(BytesRefFieldComparatorSource.java:91)
        at org.opensearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource$2.getBinaryDocValues(BytesRefFieldComparatorSource.java:141)
        at org.apache.lucene.search.FieldComparator$TermValComparator.getLeafComparator(FieldComparator.java:280)
        at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:176)
        at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.<init>(TopFieldCollector.java:64)
        at org.apache.lucene.search.TopFieldCollector$PagingFieldCollector$1.<init>(TopFieldCollector.java:254)
        at org.apache.lucene.search.TopFieldCollector$PagingFieldCollector.getLeafCollector(TopFieldCollector.java:254)
        at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:306)
        at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551)
        at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:360)
        at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:447)
        at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:431)
        at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:65)
        at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:66)
        at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)
        at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155)
        at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533)
        at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597)
        at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566)
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:917)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

Related component

Search:Resiliency

To Reproduce

One way to potentially reproduce this is

  1. Create an index with field data cache enabled on some of the text value fields.
  2. Ingest data till field data cache reaches (say 20%). Use GET /_cat/fielddata for monitoring.
  3. Set breaker limit to 1%.
  4. Execute heavy search queries (resulting in >1% of data size) on fields with field data cache enabled.

Expected behavior

  1. A validation should be added in OpenSearch to reject update setting request if indices.breaker.fielddata.limit is less than indices.fielddata.cache.size.
  2. Default value of indices.breaker.fielddata.limit is 40% of JVM and default cache size is unbounded. We should also consider setting the default cache size to be less than default breaker limit (say 38%).

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    SearchSearch query, autocomplete ...etcSearch:ResiliencybugSomething isn't working

    Type

    No type

    Projects

    Status

    Later (6 months plus)

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions