Description
Describe the bug
We have a cluster where each of the nodes data nodes have 30gb heap:
id version heap.max
zHWn 2.19.0 30gb
tdfW 2.19.0 30gb
dTAL 2.19.0 30gb
V0G4 2.19.0 30gb
j-z3 2.19.0 30gb
dzM2 2.19.0 30gb
...
and we have the default search backpressure settings with monitor_only
mode enabled:
"search_backpressure.cancellation_burst": "10.0",
"search_backpressure.cancellation_rate": "0.003",
"search_backpressure.cancellation_ratio": "0.1",
"search_backpressure.mode": "monitor_only",
"search_backpressure.node_duress.cpu_threshold": "0.9",
"search_backpressure.node_duress.heap_threshold": "0.7",
"search_backpressure.node_duress.num_successive_breaches": "3",
"search_backpressure.search_shard_task.cancellation_burst": "10.0",
"search_backpressure.search_shard_task.cancellation_rate": "0.003",
"search_backpressure.search_shard_task.cancellation_ratio": "0.1",
"search_backpressure.search_shard_task.cpu_time_millis_threshold": "15000",
"search_backpressure.search_shard_task.elapsed_time_millis_threshold": "30000",
"search_backpressure.search_shard_task.heap_moving_average_window_size": "100",
"search_backpressure.search_shard_task.heap_percent_threshold": "0.005",
"search_backpressure.search_shard_task.heap_variance": "2.0",
"search_backpressure.search_shard_task.total_heap_percent_threshold": "0.05",
"search_backpressure.search_task.cancellation_burst": "5.0",
"search_backpressure.search_task.cancellation_rate": "0.003",
"search_backpressure.search_task.cancellation_ratio": "0.1",
"search_backpressure.search_task.cpu_time_millis_threshold": "30000",
"search_backpressure.search_task.elapsed_time_millis_threshold": "45000",
"search_backpressure.search_task.heap_moving_average_window_size": "100",
"search_backpressure.search_task.heap_percent_threshold": "0.02",
"search_backpressure.search_task.heap_variance": "2.0",
"search_backpressure.search_task.total_heap_percent_threshold": "0.05",
We're looking into enforcing search backpressure, however, it's not clear why we constantly see log messages like this:
[monitor_only mode] cancelling task [398397401] due to high resource consumption [heap usage exceeded [3.4gb >= 800.5kb]
[monitor_only mode] cancelling task [484941798] due to high resource consumption [heap usage exceeded [1.7gb >= 470kb]]
[monitor_only mode] cancelling task [579709438] due to high resource consumption [heap usage exceeded [5.5gb >= 1.5mb]]
[monitor_only mode] cancelling task [398406650] due to high resource consumption [heap usage exceeded [2.5gb >= 39.2kb]]
That is, why is the number on the right of the above expressions so low? I would say, on average, it is in the tens of megabytes but is quite often in the tens or hundreds of kilobytes range. Perhaps I'm just not understanding how this threshold is being calculated, but the lowest heap threshold is search_backpressure.search_shard_task.heap_percent_threshold
at 0.005, which with 30gb heap, is around 150mb. So where are numbers like 470kb coming from?
Related component
Search:Resiliency
To Reproduce
Sorry, right now I'm not sure which searches are causing these messages.
Expected behavior
The number on the right of the heap usage exceeded
messages shouldn't be so low.
Additional Details
Plugins
opensearch-alerting 2.19.0.0
opensearch-anomaly-detection 2.19.0.0
opensearch-asynchronous-search 2.19.0.0
opensearch-cross-cluster-replication 2.19.0.0
opensearch-custom-codecs 2.19.0.0
opensearch-flow-framework 2.19.0.0
opensearch-geospatial 2.19.0.0
opensearch-index-management 2.19.0.0
opensearch-job-scheduler 2.19.0.0
opensearch-knn 2.19.0.0
opensearch-ltr 2.19.0.0
opensearch-ml 2.19.0.0
opensearch-neural-search 2.19.0.0
opensearch-notifications 2.19.0.0
opensearch-notifications-core 2.19.0.0
opensearch-observability 2.19.0.0
opensearch-performance-analyzer 2.19.0.0
opensearch-reports-scheduler 2.19.0.0
opensearch-security 2.19.0.0
opensearch-security-analytics 2.19.0.0
opensearch-skills 2.19.0.0
opensearch-sql 2.19.0.0
opensearch-system-templates 2.19.0.0
query-insights 2.19.0.0
Screenshots
N/A
Host/Environment (please complete the following information):
- OS: Ubuntu
- Version: 22.04
Additional context
N/A
Metadata
Metadata
Assignees
Labels
Type
Projects
Status