Skip to content

[BUG] search.max_buckets is not evaluated correctly for terms agg #13314

Open
@jed326

Description

@jed326

Describe the bug

The search.max_buckets setting (ref) is used to control the maximum number of aggregation buckets allowed in a single search response.

For terms aggregations the way in which the bucket count is calculated is that sub-aggregation buckets are counted first, and then if their parent bucket is pruned from the candidate list the sub-aggregation bucket count is then subtracted. This means that it is not really accurately counting the number of buckets, see reproduction section below for an example.

More broadly speaking, I'm not sure if this search.max_buckets setting is actually useful. I think the setting can have 2 uses:

  1. Limit the response size of a given search request -- This isn't quite working correctly as shown by this issue
  2. Stop bad aggregations from taking up too many resources -- Most aggregation types do not enforce this max_buckets setting at the shard level, it's only evaluated during reduce on the coordinator level which is after a lot of the resource intensive portions of the search request are already completed.

Somewhat related:

Related component

Search:Resiliency

To Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

The following was done with the noaa opensearch-benchmarks workload but it's not specific to that data.

Set cluster setting:

{
    "persistent": {
        "search.max_buckets": 2
    }
}

This search request does not hit the max buckets limit

{
    "size": 0,
    "aggs": {
        "station": {
            "terms": {
                "field": "station.id",
                "size": 1,
                "shard_size": 1
            },
            "aggs": {
                "date": {
                    "terms": {
                        "field": "date",
                        "size": 1,
                        "shard_size": 1
                    }
                }
            }
        }
    }
}

Neither does this one

{
    "size": 0,
    "aggs": {
        "station": {
            "terms": {
                "field": "station.id",
                "size": 1,
                "shard_size": 1
            },
            "aggs": {
                "date": {
                    "terms": {
                        "field": "date",
                        "size": 1,
                        "shard_size": 2
                    }
                }
            }
        }
    }
}

However, this one does:

{
    "size": 0,
    "aggs": {
        "station": {
            "terms": {
                "field": "station.id",
                "size": 1,
                "shard_size": 2
            },
            "aggs": {
                "date": {
                    "terms": {
                        "field": "date",
                        "size": 1,
                        "shard_size": 1
                    }
                }
            }
        }
    }
}

In all 3 of these cases the response size on the coordinator is only 2 buckets.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Next (Next Quarter)

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions