[BUG] Handling heap usage exceed error #711

sandeshkr419 · 2023-11-02T23:38:52Z

What is the bug?
This phenomena is seen when a detector is searching documents (via Alerting plugin) and OpenSearch rejects the percolate query search request.

Sample error messgae:

"error_message" : "IllegalStateException[Failed to run percolate search for sourceIndex [log-aws-cloudtrail-2023-08] and queryIndex [.opensearch-sap-cloudtrail-detectors-queries-000001] for 10000 document(s)]; 
nested: SearchPhaseExecutionException[all shards failed]; 
nested: [cancelled task with reason: heap usage exceeded [45.9mb >= 9.2mb]]; 
nested: OpenSearchRejectedExecutionException[cancelled task with reason: heap usage exceeded [45.9mb >= 9.2mb]];

The reasons for this are:

User setting detector run rate too less frequent - running the detector more frequently would allow documents in batches smaller than 10k already but this again has a constraint of 1m frequency as the most frequent. The size of those 10k documents is also a constraint against the available heap usage at that time.
Using lower RAM/heap instances - one of the biggest contributing factor is less available heap memory in the first place. For smaller instance type, this is more likely to happen.

Possible solutions:

Batching the documents - however, the issue is identification of batch size and maximum number of batches?
Reducing the number of documents to be processed in a single batch as a function of instance heap size. This may require the number of documents in a single batch to be configurable. Something along the lines with 1k documents for 1GB heap, 2k docs for 2GB heap.....10k documents for 8GB and higher. This can be configurable and can be tuned up or down depending upon how the cluster is able to handle the documents.

Related issue: opensearch-project/OpenSearch#2818
How can one reproduce the bug?
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

What is the expected behavior?
A clear and concise description of what you expected to happen.

What is your host/environment?

OS: [e.g. iOS]
Version [e.g. 22]
Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

The text was updated successfully, but these errors were encountered:

eirsep · 2023-12-22T23:36:26Z

Optimize doc level monitor performance: Batch docs for percolate query searches based on available memory #1353

eirsep · 2024-01-02T20:52:06Z

using number of documents is not the right parameter IMO
Rather just use heap size and set a threshold (can start with x% of heap; x should be a cluster setting whose default is derived from the righ benchmarking) to batch available docs in memory to perform percolate query and fetch documents for the remaining shards.

…rch-project#705) (opensearch-project#711) Signed-off-by: Ashish Agrawal <[email protected]> Signed-off-by: Ashish Agrawal <[email protected]> (cherry picked from commit 41265f86c371a1bea697376b51816ab495bdbe96) Co-authored-by: Ashish Agrawal <[email protected]>

engechas · 2024-04-09T21:24:22Z

This was resolved with the recent performance enhancements. The number of docs submitted in each percolate request now considers the available heap

sandeshkr419 added bug Something isn't working untriaged labels Nov 2, 2023

eirsep removed the untriaged label Dec 22, 2023

eirsep self-assigned this Dec 22, 2023

engechas closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Handling heap usage exceed error #711

[BUG] Handling heap usage exceed error #711

sandeshkr419 commented Nov 2, 2023

eirsep commented Dec 22, 2023 •

edited

Loading

eirsep commented Jan 2, 2024 •

edited

Loading

engechas commented Apr 9, 2024

[BUG] Handling heap usage exceed error #711

[BUG] Handling heap usage exceed error #711

Comments

sandeshkr419 commented Nov 2, 2023

eirsep commented Dec 22, 2023 • edited Loading

eirsep commented Jan 2, 2024 • edited Loading

engechas commented Apr 9, 2024

eirsep commented Dec 22, 2023 •

edited

Loading

eirsep commented Jan 2, 2024 •

edited

Loading