You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the bug?
This phenomena is seen when a detector is searching documents (via Alerting plugin) and OpenSearch rejects the percolate query search request.
Sample error messgae:
"error_message" : "IllegalStateException[Failed to run percolate search for sourceIndex [log-aws-cloudtrail-2023-08] and queryIndex [.opensearch-sap-cloudtrail-detectors-queries-000001] for 10000 document(s)];
nested: SearchPhaseExecutionException[all shards failed];
nested: [cancelled task with reason: heap usage exceeded [45.9mb >= 9.2mb]];
nested: OpenSearchRejectedExecutionException[cancelled task with reason: heap usage exceeded [45.9mb >= 9.2mb]];
The reasons for this are:
User setting detector run rate too less frequent - running the detector more frequently would allow documents in batches smaller than 10k already but this again has a constraint of 1m frequency as the most frequent. The size of those 10k documents is also a constraint against the available heap usage at that time.
Using lower RAM/heap instances - one of the biggest contributing factor is less available heap memory in the first place. For smaller instance type, this is more likely to happen.
Possible solutions:
Batching the documents - however, the issue is identification of batch size and maximum number of batches?
Reducing the number of documents to be processed in a single batch as a function of instance heap size. This may require the number of documents in a single batch to be configurable. Something along the lines with 1k documents for 1GB heap, 2k docs for 2GB heap.....10k documents for 8GB and higher. This can be configurable and can be tuned up or down depending upon how the cluster is able to handle the documents.
using number of documents is not the right parameter IMO
Rather just use heap size and set a threshold (can start with x% of heap; x should be a cluster setting whose default is derived from the righ benchmarking) to batch available docs in memory to perform percolate query and fetch documents for the remaining shards.
riysaxen-amzn
pushed a commit
to riysaxen-amzn/security-analytics
that referenced
this issue
Mar 25, 2024
What is the bug?
This phenomena is seen when a detector is searching documents (via Alerting plugin) and OpenSearch rejects the percolate query search request.
Sample error messgae:
The reasons for this are:
Possible solutions:
Related issue: opensearch-project/OpenSearch#2818
How can one reproduce the bug?
Steps to reproduce the behavior:
What is the expected behavior?
A clear and concise description of what you expected to happen.
What is your host/environment?
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.
The text was updated successfully, but these errors were encountered: