-
Notifications
You must be signed in to change notification settings - Fork 2.5k
1.86 cargo package
fail when workspace crate contains symlink into git submodule
#15384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Might be related to #14981. |
Sorry, I've added steps to reproduce the issue with the project in question in the original report. |
This should help avoid issues cargo-1.86.0 has with symlink files as reported in rust-lang/cargo#15384.
I just created a duplicate of this bug at #15413 . My report has a minimal reproduction from a single attached zip file. |
cargo package
fail when workspace crate contains symlink into git submodule
Dirtiness check for symlinks is mostly informational. And changes in submodule would fail git-status as well (see rust-lang#15384). To avoid adding complicated logic to handle that, for now we ignore the status check failure.
Dirtiness check for symlinks is mostly informational. And changes in submodule would fail git-status as well (see rust-lang#15384). To avoid adding complicated logic to handle that, for now we ignore the status check failure.
### What does this PR try to resolve? If a there is a symlink into a git repository/submodule, when checking its git status with the wrong outer repo, we'll get an NotFound error, as the object doesn't belong to the outer repository. This kind of error blocked the entire `cargo package` operation. This fix additionally discovers the nearest Git repository, and then checks status with that, assuming the repo is the parent of the source file of the symlink. This is a best effort solution, so if the check fails we ignore. ### How should we test and review this PR? If we don't want the complication, we could drop the last commit, ignore the error, and forget about handling submodules fixes #15384 fixes #15413
@radhermit I noticed that you have a relatively large submodule testdata. |
The performance regression is about two orders of magnitude worse, 1.85.1 taking significantly less than a second and the version with the workaround taking almost 10 seconds before there is any visual feedback from running |
What was in the console output before and after the 10 seconds? |
Nothing was shown until the expected output from Using strace it appears that the current code is causing an extremely large amount of readlink and newfstatat calls. Without looking at the code much, it appears to try resolving a symlink of every parent dir for each directory in the submodule and then iterates back through the entire chain trying to determine if each directory is a git repo or submodule itself. I feel like there should be a much more efficient way to handle this properly. For example, couldn't the defined submodules of the main git repo be determined by reading the .gitmodules file (or wherever else they're registered in the .git directory). I'm not familiar with recursively nested submodules (if cargo wants to support them for this), but it feels like there should be a better way then recursively scanning the directory tree and performing a lot of fs-related syscalls that are often duplicated. |
Thanks. Yeah I also figured it out but do not have much time refactoring the existing O(n^2) algo. THe outcome of it is #15419, which fixes the original issue but puts off checking submodules. |
Thanks for your efforts to solve the main problem, hopefully someone (or perhaps me if I get inspired) can make it more efficient later. |
…5419) ### What does this PR try to resolve? This reverts commit 71ea2e5. `Repository::discover` and `Repository::status_file` are too expenstive to run inside a loop. And `cargo package` are doing a lot of duplicate works for checking submodule VCS status. Alternative fixes might look like * Let `status_submodules` function returns a path entry set, so Cargo can check whether a source file is dirty based on that. * When listing files in `PathSource`, attach the VCS status of a path entry assoicated with. Then subsequent operations can skip status check entirely. However, the above solutions are not trivial, and the dirtiness check is informational only based on T-cargo conclusion, so we should be good just reverting the change now. Again, the caveat of this is that we can't really detect dirty symlinks that link into a Git submodule. ### How should we test and review this PR? Should be good to merge. We still got #15384 fixed via d760263 ### Additional information See <#15384 (comment)>.
Problem
Previously it was possible to symlink a shared directory (e.g. testdata) into multiple crates from the root workspace directory where it exists as a git submodule.
Now in the cargo version that comes with rust-1.86, trying to use
cargo publish
orcargo package
returns nonexistent file errors such as the following:error: attempt to get status of nonexistent file 'testdata/toml/dep.toml'; class=Invalid (3); code=NotFound (-3)
Reverting to an earlier version of rust alleviates the issue.
Steps
For reference, I'm running into the issue with the project at https://github.com/pkgcraft/pkgcraft that symlinks its testdata directory into multiple crates. See the following steps to reproduce the issue:
Version
The text was updated successfully, but these errors were encountered: