-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Pack file pointers when merging BKD trees #14393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
final OneDimensionBKDWriter oneDimWriter = | ||
new OneDimensionBKDWriter(metaOut, indexOut, dataOut); | ||
new OneDimensionBKDWriter(metaOut, indexOut, dataOut, new ArrayLongAccumulator(numLeaves)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could/should we use the "packed" impl all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, I am concerned on the overhead of using it when flushing small segments.
Maybe it is not worthy the complexity of specializing for that case and we should use the packed impl all the time. I am ok either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intuitively, there is ~512x (the leaf node size) more data than file pointers, so the bottleneck of flushing/merging should be on the data rather than on file pointers. So having small overhead on file pointers should be ok (hopefully benchmarks will confirm this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, let's use it all the time for OneDimensionBKDWriter, it clearly simplifies the implementation
@@ -1961,7 +1989,7 @@ private void build( | |||
int leafCardinality = heapSource.computeCardinality(from, to, commonPrefixLengths); | |||
|
|||
// Save the block file pointer: | |||
leafBlockFPs[leavesOffset] = out.getFilePointer(); | |||
leafBlockFPs.add(out.getFilePointer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, since filepointers are monotonic, we can make the compact. NICE!
We are currently using long arrays to hold file pointers. These arrays can get pretty big if the number of points is big which seems wasteful, moreover when those file pointers are monotonically increasing. This commit proposes to pack these arrays using existing lucene algorithms to save some heap during merging and to avoid humongous allocations.
relates #14382