Skip to content

Fix HistogramCollector to not create zero-count buckets. #14421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 31, 2025

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Mar 28, 2025

If a bucket in the middle of the range doesn't match docs, it would be returned with a count of zero. Better not return it at all.

If a bucket in the middle of the range doesn't match docs, it would be returned
with a count of zero. Better not return it at all.
Copy link
Contributor

@gsmiller gsmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

.add(NumericDocValuesField.newSlowRangeQuery("f", Long.MIN_VALUE, 2), Occur.SHOULD)
.add(NumericDocValuesField.newSlowRangeQuery("f", 10, Long.MAX_VALUE), Occur.SHOULD)
.build();
actualCounts = searcher.search(query, new HistogramCollectorManager("f", 4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I was wondering if checkMaxBuckets(collectorCounts.size(), maxBuckets) might cause some of the tests earlier expecting exception to fail. But, all the tests with counts[i] == 0, use 1024 as maxBuckets which is comfortably over collectorCounts.size()

Maybe, we can specify 3 as maxBuckets for even stronger condition:

searcher.search(query, new HistogramCollectorManager("f", 4, 3));

collectorCounts.addTo(leafMinBucket + i, counts[i]);
if (counts[i] != 0) {
collectorCounts.addTo(leafMinBucket + i, counts[i]);
}
}
checkMaxBuckets(collectorCounts.size(), maxBuckets);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While unrelated to this change, I am wondering if we should check eagerly to prevent unnecessary iterations:

if (counts[i] != 0) {
  collectorCounts.addTo(leafMinBucket + i, counts[i]);
  checkMaxBuckets(collectorCounts.size(), maxBuckets);
}

We are doing similar validation in other places:

if (bucket != prevBucket) {
  counts.addTo(bucket, 1);
  checkMaxBuckets(counts.size(), maxBuckets);
  prevBucket = bucket;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok the way it is. The end goal is to prevent unbounded heap allocation. In this case, the amount of excess heap we may allocate is bounded by 1024 entries, so I'd err on the side of simplicity by not checking the number of buckets in the loop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fair. I was expecting low bound like 1024, just wanted to confirm!

jpountz added 3 commits March 31, 2025 12:23
If a bucket in the middle of the range doesn't match docs, it would be returned
with a count of zero. Better not return it at all.
@jpountz jpountz merged commit 076f4e4 into apache:main Mar 31, 2025
7 checks passed
@jpountz jpountz deleted the dont_create_zero_count_buckets branch March 31, 2025 14:41
jpountz added a commit that referenced this pull request Mar 31, 2025
If a bucket in the middle of the range doesn't match docs, it would be returned
with a count of zero. Better not return it at all.
jpountz added a commit that referenced this pull request Mar 31, 2025
If a bucket in the middle of the range doesn't match docs, it would be returned
with a count of zero. Better not return it at all.
@jpountz
Copy link
Contributor Author

jpountz commented Mar 31, 2025

I backported to branch_10_2 since this is a bugfix cc @iverase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants