Skip to content

1.86 cargo package fail when workspace crate contains symlink into git submodule #15384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
radhermit opened this issue Apr 3, 2025 · 10 comments · Fixed by #15416
Closed

1.86 cargo package fail when workspace crate contains symlink into git submodule #15384

radhermit opened this issue Apr 3, 2025 · 10 comments · Fixed by #15416
Labels
C-bug Category: bug Command-package regression-from-stable-to-stable Regression in stable that worked in a previous stable release. S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@radhermit
Copy link

radhermit commented Apr 3, 2025

Problem

Previously it was possible to symlink a shared directory (e.g. testdata) into multiple crates from the root workspace directory where it exists as a git submodule.

Now in the cargo version that comes with rust-1.86, trying to use cargo publish or cargo package returns nonexistent file errors such as the following:

error: attempt to get status of nonexistent file 'testdata/toml/dep.toml'; class=Invalid (3); code=NotFound (-3)

Reverting to an earlier version of rust alleviates the issue.

Steps

For reference, I'm running into the issue with the project at https://github.com/pkgcraft/pkgcraft that symlinks its testdata directory into multiple crates. See the following steps to reproduce the issue:

  1. git clone --recursive https://github.com/pkgcraft/pkgcraft.git
  2. cd pkgcraft
  3. git checkout pkgcraft-tools-0.0.25
  4. cargo package -p pkgcraft-tools

Version

cargo 1.86.0 (adf9b6ad1 2025-02-28)
release: 1.86.0
commit-hash: adf9b6ad14cfa10ff680d5806741a144f7163698
commit-date: 2025-02-28
host: x86_64-unknown-linux-gnu
libgit2: 1.9.0 (sys:0.20.0 vendored)
libcurl: 8.12.0-DEV (sys:0.4.79+curl-8.12.0 vendored ssl:OpenSSL/1.1.1w)
ssl: OpenSSL 1.1.1w  11 Sep 2023
os: Gentoo Linux 2.17.0 [64-bit]
@radhermit radhermit added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Apr 3, 2025
@weihanglo
Copy link
Member

Might be related to #14981.
Would you mind providing a minimal reproduction?

@weihanglo weihanglo added Command-package S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. and removed S-triage Status: This issue is waiting on initial triage. labels Apr 3, 2025
@radhermit
Copy link
Author

Might be related to #14981. Would you mind providing a minimal reproduction?

Sorry, I've added steps to reproduce the issue with the project in question in the original report.

radhermit added a commit to pkgcraft/pkgcraft that referenced this issue Apr 3, 2025
This should help avoid issues cargo-1.86.0 has with symlink files as
reported in rust-lang/cargo#15384.
@e00E
Copy link

e00E commented Apr 10, 2025

I just created a duplicate of this bug at #15413 . My report has a minimal reproduction from a single attached zip file.

@weihanglo weihanglo added S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. regression-from-stable-to-stable Regression in stable that worked in a previous stable release. Command-package and removed Command-package S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. labels Apr 10, 2025
@weihanglo weihanglo changed the title symlinked file access is broken with cargo in rust-1.86 1.86 cargo package fail when workspace crate contains symlink into git submodule Apr 10, 2025
weihanglo added a commit to weihanglo/cargo that referenced this issue Apr 10, 2025
Dirtiness check for symlinks is mostly informational.
And changes in submodule would fail git-status as well (see rust-lang#15384).
To avoid adding complicated logic to handle that,
for now we ignore the status check failure.
weihanglo added a commit to weihanglo/cargo that referenced this issue Apr 10, 2025
Dirtiness check for symlinks is mostly informational.
And changes in submodule would fail git-status as well (see rust-lang#15384).
To avoid adding complicated logic to handle that,
for now we ignore the status check failure.
github-merge-queue bot pushed a commit that referenced this issue Apr 10, 2025
### What does this PR try to resolve?

If a there is a symlink into a git repository/submodule,
when checking its git status with the wrong outer repo,
we'll get an NotFound error,
as the object doesn't belong to the outer repository.
This kind of error blocked the entire `cargo package` operation.

This fix additionally discovers the nearest Git repository,
and then checks status with that,
assuming the repo is the parent of the source file of the symlink.
This is a best effort solution, so if the check fails we ignore.

### How should we test and review this PR?

If we don't want the complication,
we could drop the last commit, ignore the error, and forget about
handling submodules

fixes #15384
fixes #15413
@weihanglo
Copy link
Member

@radhermit I noticed that you have a relatively large submodule testdata.
Would you mind checking if there is any noticeable performance regression between 1.85.1 and f2c4849?

@radhermit
Copy link
Author

@radhermit I noticed that you have a relatively large submodule testdata. Would you mind checking if there is any noticeable performance regression between 1.85.1 and f2c4849?

The performance regression is about two orders of magnitude worse, 1.85.1 taking significantly less than a second and the version with the workaround taking almost 10 seconds before there is any visual feedback from running cargo package.

@weihanglo
Copy link
Member

before there is any visual feedback from running cargo package.

What was in the console output before and after the 10 seconds?

@radhermit
Copy link
Author

before there is any visual feedback from running cargo package.

What was in the console output before and after the 10 seconds?

Nothing was shown until the expected output from cargo package started around the 10 second mark.

Using strace it appears that the current code is causing an extremely large amount of readlink and newfstatat calls. Without looking at the code much, it appears to try resolving a symlink of every parent dir for each directory in the submodule and then iterates back through the entire chain trying to determine if each directory is a git repo or submodule itself.

I feel like there should be a much more efficient way to handle this properly. For example, couldn't the defined submodules of the main git repo be determined by reading the .gitmodules file (or wherever else they're registered in the .git directory). I'm not familiar with recursively nested submodules (if cargo wants to support them for this), but it feels like there should be a better way then recursively scanning the directory tree and performing a lot of fs-related syscalls that are often duplicated.

@weihanglo
Copy link
Member

Thanks. Yeah I also figured it out but do not have much time refactoring the existing O(n^2) algo. THe outcome of it is #15419, which fixes the original issue but puts off checking submodules.

@radhermit
Copy link
Author

Thanks for your efforts to solve the main problem, hopefully someone (or perhaps me if I get inspired) can make it more efficient later.

@weihanglo
Copy link
Member

Reopened #14967 to track that.
Also for overall dirtiness check performance, we have #14955.

github-merge-queue bot pushed a commit that referenced this issue Apr 11, 2025
…5419)

### What does this PR try to resolve?

This reverts commit 71ea2e5.

`Repository::discover` and `Repository::status_file` are too expenstive
to run inside a loop. And `cargo package` are doing a lot of duplicate
works for checking submodule VCS status.

Alternative fixes might look like

* Let `status_submodules` function returns a path entry set, so
  Cargo can check whether a source file is dirty based on that.
* When listing files in `PathSource`, attach the VCS status of a
  path entry assoicated with. Then subsequent operations can skip
  status check entirely.

However, the above solutions are not trivial, and the dirtiness check is
informational only based on T-cargo conclusion, so we should be
good just reverting the change now.

Again, the caveat of this is that we can't really detect
dirty symlinks that link into a Git submodule.

### How should we test and review this PR?

Should be good to merge. We still got #15384 fixed via
d760263

### Additional information

See
<#15384 (comment)>.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug Command-package regression-from-stable-to-stable Regression in stable that worked in a previous stable release. S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants