Skip to content

Improve Revs Performance by Avoiding Repeated Tree Checkouts and Adding Version Indexing #1577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

suratkhan
Copy link

This PR solves #1559 by refactoring the Revs implementation to improve performance by:

  • Pre-building an index of Rev structs from commit trees.
  • Eliminating repeated checkout_tree calls.
  • Using binary search for efficient find_version lookups.
  • Supporting iteration over a cached, sorted list of versions.

@suratkhan suratkhan requested a review from smoelius as a code owner April 7, 2025 07:46
@CLAassistant
Copy link

CLAassistant commented Apr 7, 2025

CLA assistant check
All committers have signed the CLA.

@suratkhan
Copy link
Author

@smoelius, here’s a summary of the changes I made in this PR:

  • Refactored Revs to pre-index all Rev structs at initialization, reducing repeated repository traversal.
  • Replaced checkout_tree logic with direct blob access and parsing for Cargo.toml and rust-toolchain.
  • Improved version lookup speed by introducing binary search in find_version.
  • Implemented a proper RevIter for forward iteration over all versions.
  • Maintained existing functionality, output format, and test coverage.

@smoelius
Copy link
Collaborator

smoelius commented Apr 7, 2025

Thanks, @suratkhan! I'll do my best to review this this week.

Copy link
Collaborator

@smoelius smoelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments below. I think they may require significant changes, so I'm going to pause here and let you reflect on them.

I really appreciate you working on this.

.next()
.unwrap_or_else(|| {
Err(anyhow!("Could not determine latest `clippy_utils` version"))
})?,
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this declaration of rev is the only thing in this file that should have to change.

Many of the other changes seem like unnecessary reformatting.

If I'm not mistaken in thinking that, could you please put everything else back the way it was?

.next()
.unwrap_or_else(|| {
Err(anyhow!("Could not determine latest `clippy_utils` version"))
})?,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer to not introduce another iterator implementation.

For this case (where we cannot use binary search), could you instead just have a function that does a linear search to find the commit where the most recent Rust version became active?

}

// Sort versions for binary search
versions.sort_by(|a, b| a.version.cmp(&b.version));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably wasn't sufficiently clear in what I was imagining, but it was that the binary search would be over the commits rather than the versions.

I see that the current code visits every commit and extracts the Cargo.toml and rust-toolchain blobs. I expect that to be faster than full checkouts, but it still means the code does work at every commit, which I think we can avoid.

Would it be possible to adjust the code so that vector holds commits rather than versions?

Also, could adding to the vector be done in a lazy, doubling manner? E.g.:

  • read 1 commit and add it to the vector
  • if the oldest read commit is not sufficiently old, read 2 more commits and add them to the vector
  • if the oldest read commit is not sufficiently old, read 4 more commits and add them to the vector
  • etc.

Eventually, a commit will be found with a Rust version older than the desired Rust version. At that point, the binary search can begin over all of the commits accumulated in the vector.

Please let me know if my description is too informal.

@suratkhan
Copy link
Author

hello @smoelius, I have applied the things you mentioned in the comments. Here is a summary of what I did:

  1. Refactored revs.rs:
    • Replaced the full commit history scan and upfront version indexing with a "lazy-loading" approach. Commits are loaded in batches as needed.
    • Implemented binary search over commits (instead of versions) to efficiently find the revision corresponding to a specific --rust-version.
    • Added a dedicated find_latest_rev function that scans backwards from HEAD for the latest version (used when --rust-version is omitted).
    • Removed the old build_version_index logic and the RevIter struct.
  2. Updated mod.rs:
    • Modified upgrade_package to call the new revs.find_version() and revs.find_latest_rev() methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants