Skip to content

[BUG] SearchBackpressureService threshold calculations incorrect? #17947

Open
@tronboto

Description

@tronboto

Describe the bug

We have a cluster where each of the nodes data nodes have 30gb heap:

id   version heap.max
zHWn 2.19.0      30gb
tdfW 2.19.0      30gb
dTAL 2.19.0      30gb
V0G4 2.19.0      30gb
j-z3 2.19.0      30gb
dzM2 2.19.0      30gb
...

and we have the default search backpressure settings with monitor_only mode enabled:

    "search_backpressure.cancellation_burst": "10.0",
    "search_backpressure.cancellation_rate": "0.003",
    "search_backpressure.cancellation_ratio": "0.1",
    "search_backpressure.mode": "monitor_only",
    "search_backpressure.node_duress.cpu_threshold": "0.9",
    "search_backpressure.node_duress.heap_threshold": "0.7",
    "search_backpressure.node_duress.num_successive_breaches": "3",
    "search_backpressure.search_shard_task.cancellation_burst": "10.0",
    "search_backpressure.search_shard_task.cancellation_rate": "0.003",
    "search_backpressure.search_shard_task.cancellation_ratio": "0.1",
    "search_backpressure.search_shard_task.cpu_time_millis_threshold": "15000",
    "search_backpressure.search_shard_task.elapsed_time_millis_threshold": "30000",
    "search_backpressure.search_shard_task.heap_moving_average_window_size": "100",
    "search_backpressure.search_shard_task.heap_percent_threshold": "0.005",
    "search_backpressure.search_shard_task.heap_variance": "2.0",
    "search_backpressure.search_shard_task.total_heap_percent_threshold": "0.05",
    "search_backpressure.search_task.cancellation_burst": "5.0",
    "search_backpressure.search_task.cancellation_rate": "0.003",
    "search_backpressure.search_task.cancellation_ratio": "0.1",
    "search_backpressure.search_task.cpu_time_millis_threshold": "30000",
    "search_backpressure.search_task.elapsed_time_millis_threshold": "45000",
    "search_backpressure.search_task.heap_moving_average_window_size": "100",
    "search_backpressure.search_task.heap_percent_threshold": "0.02",
    "search_backpressure.search_task.heap_variance": "2.0",
    "search_backpressure.search_task.total_heap_percent_threshold": "0.05",

We're looking into enforcing search backpressure, however, it's not clear why we constantly see log messages like this:

[monitor_only mode] cancelling task [398397401] due to high resource consumption [heap usage exceeded [3.4gb >= 800.5kb]
[monitor_only mode] cancelling task [484941798] due to high resource consumption [heap usage exceeded [1.7gb >= 470kb]]
[monitor_only mode] cancelling task [579709438] due to high resource consumption [heap usage exceeded [5.5gb >= 1.5mb]]
[monitor_only mode] cancelling task [398406650] due to high resource consumption [heap usage exceeded [2.5gb >= 39.2kb]]

That is, why is the number on the right of the above expressions so low? I would say, on average, it is in the tens of megabytes but is quite often in the tens or hundreds of kilobytes range. Perhaps I'm just not understanding how this threshold is being calculated, but the lowest heap threshold is search_backpressure.search_shard_task.heap_percent_threshold at 0.005, which with 30gb heap, is around 150mb. So where are numbers like 470kb coming from?

Related component

Search:Resiliency

To Reproduce

Sorry, right now I'm not sure which searches are causing these messages.

Expected behavior

The number on the right of the heap usage exceeded messages shouldn't be so low.

Additional Details

Plugins

opensearch-alerting                  2.19.0.0
opensearch-anomaly-detection         2.19.0.0
opensearch-asynchronous-search       2.19.0.0
opensearch-cross-cluster-replication 2.19.0.0
opensearch-custom-codecs             2.19.0.0
opensearch-flow-framework            2.19.0.0
opensearch-geospatial                2.19.0.0
opensearch-index-management          2.19.0.0
opensearch-job-scheduler             2.19.0.0
opensearch-knn                       2.19.0.0
opensearch-ltr                       2.19.0.0
opensearch-ml                        2.19.0.0
opensearch-neural-search             2.19.0.0
opensearch-notifications             2.19.0.0
opensearch-notifications-core        2.19.0.0
opensearch-observability             2.19.0.0
opensearch-performance-analyzer      2.19.0.0
opensearch-reports-scheduler         2.19.0.0
opensearch-security                  2.19.0.0
opensearch-security-analytics        2.19.0.0
opensearch-skills                    2.19.0.0
opensearch-sql                       2.19.0.0
opensearch-system-templates          2.19.0.0
query-insights                       2.19.0.0

Screenshots
N/A

Host/Environment (please complete the following information):

  • OS: Ubuntu
  • Version: 22.04

Additional context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions