-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix HistogramCollector to not create zero-count buckets. #14421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If a bucket in the middle of the range doesn't match docs, it would be returned with a count of zero. Better not return it at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
.add(NumericDocValuesField.newSlowRangeQuery("f", Long.MIN_VALUE, 2), Occur.SHOULD) | ||
.add(NumericDocValuesField.newSlowRangeQuery("f", 10, Long.MAX_VALUE), Occur.SHOULD) | ||
.build(); | ||
actualCounts = searcher.search(query, new HistogramCollectorManager("f", 4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I was wondering if checkMaxBuckets(collectorCounts.size(), maxBuckets)
might cause some of the tests earlier expecting exception to fail. But, all the tests with counts[i] == 0
, use 1024
as maxBuckets
which is comfortably over collectorCounts.size()
Maybe, we can specify 3 as maxBuckets
for even stronger condition:
searcher.search(query, new HistogramCollectorManager("f", 4, 3));
collectorCounts.addTo(leafMinBucket + i, counts[i]); | ||
if (counts[i] != 0) { | ||
collectorCounts.addTo(leafMinBucket + i, counts[i]); | ||
} | ||
} | ||
checkMaxBuckets(collectorCounts.size(), maxBuckets); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While unrelated to this change, I am wondering if we should check eagerly to prevent unnecessary iterations:
if (counts[i] != 0) {
collectorCounts.addTo(leafMinBucket + i, counts[i]);
checkMaxBuckets(collectorCounts.size(), maxBuckets);
}
We are doing similar validation in other places:
if (bucket != prevBucket) {
counts.addTo(bucket, 1);
checkMaxBuckets(counts.size(), maxBuckets);
prevBucket = bucket;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok the way it is. The end goal is to prevent unbounded heap allocation. In this case, the amount of excess heap we may allocate is bounded by 1024 entries, so I'd err on the side of simplicity by not checking the number of buckets in the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fair. I was expecting low bound like 1024, just wanted to confirm!
If a bucket in the middle of the range doesn't match docs, it would be returned with a count of zero. Better not return it at all.
If a bucket in the middle of the range doesn't match docs, it would be returned with a count of zero. Better not return it at all.
If a bucket in the middle of the range doesn't match docs, it would be returned with a count of zero. Better not return it at all.
I backported to branch_10_2 since this is a bugfix cc @iverase |
If a bucket in the middle of the range doesn't match docs, it would be returned with a count of zero. Better not return it at all.