Skip to content

Optimize ParallelLeafReader to improve term vector fetching efficienc #14373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 24, 2025

Conversation

DivyanshIITB
Copy link
Contributor

This PR optimizes ParallelLeafReader to avoid redundant term vector fetching.

  • Replaces per-field term vector fetching with a single call per reader.
  • Reduces complexity from O(n^2) to O(n).
  • Improves performance when handling large numbers of fields.
  • Verified via existing tests.

Closes #7926

@DivyanshIITB
Copy link
Contributor Author

Just a gentle reminder
@vigyasharma

Copy link
Contributor

@vigyasharma vigyasharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. Can you run ./gradlew tidy to fix formatting issues, and add a changes entry before we merge this?

@DivyanshIITB
Copy link
Contributor Author

Changes look good to me. Can you run ./gradlew tidy to fix formatting issues, and add a changes entry before we merge this?

I successfully ran ./gradlew tidy and the built was successful.
Also added the changes entry in CHANGES.txt

Comment on lines 39 to 41
- Fetches all term vectors once per reader instead of per field.
- Reduces complexity from **O(n²) to O(n)**.
- Enhances performance for documents with many fields. (Divyansh Agrawal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally keep a single bullet per changes entry. Details are already available in the pull request that the entry points too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified CHANGES.txt as you said.

@vigyasharma
Copy link
Contributor

I successfully ran ./gradlew tidy and the built was successful.

Github build is still failing on spotless (formatting). tidy will change and reformat offending files for you, you need to commit and push those changes.

@DivyanshIITB
Copy link
Contributor Author

DivyanshIITB commented Mar 23, 2025

I successfully ran ./gradlew tidy and the built was successful.

Github build is still failing on spotless (formatting). tidy will change and reformat offending files for you, you need to commit and push those changes.

Thanks for the review @vigyasharma !
I have run ./gradlew tidy and pushed the formatting fixes. Let me know if there's anything else needed.

@vigyasharma vigyasharma merged commit 9272d4d into apache:main Mar 24, 2025
7 checks passed
vigyasharma added a commit that referenced this pull request Mar 24, 2025
vigyasharma added a commit that referenced this pull request Mar 24, 2025
@vigyasharma
Copy link
Contributor

Changes merged. Thanks @DivyanshIITB !

jpountz pushed a commit to jpountz/lucene that referenced this pull request Mar 24, 2025
jpountz pushed a commit to jpountz/lucene that referenced this pull request Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ParallelLeafReader.getTermVectors can indirectly load TVs multiple times [LUCENE-6868]
2 participants