Skip to content

gh-130379: Fix incorrect zipapp logic to avoid including the target in itself #130509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 26, 2025

Conversation

pfmoore
Copy link
Member

@pfmoore pfmoore commented Feb 24, 2025

The change in #106076 was intended to ensure that zipapp would not include the target file in the list of files that get included in the target. However the logic to do that was flawed, and expensive. This PR changes the approach to generate the list of files to add before creating the target, so there is no possibility of the target being included in the first place.

@pfmoore
Copy link
Member Author

pfmoore commented Feb 24, 2025

@serhiy-storchaka you raised some concerns about the original change (unfortunately I'd merged before I saw them). Would you be willing to take a look at this revised approach?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This addresses also my concerns.

@pfmoore
Copy link
Member Author

pfmoore commented Feb 24, 2025

One possible improvement would be to check explicitly for target:

files_to_add = sorted(f for f in source.rglob('*') if f != target)

This would catch the case where the user does something like create_archive(".", target="out.pyz") when there's an existing out.pyz left over from a previous run. Arguably that's a user error (the user should be specifying a filter that ensures unexpected files like this are excluded, if they aren't able to control the content of the source directory) but it does seem like it's a common mistake.

The problem with this check is that it's not complete. A path equality check doesn't cover all cases (symlinks, use of .., etc.). I would prefer to argue that those cases are clearly not simple accidents, and anyone concerned about these should be writing their code more robustly, either tightly controlling the source directory, or using a filter that is tailored to do what they want. Otherwise, we would need to do something costly like check Path.samefile when target is a path. And we'd still have problems if the target was a file-like object writing to a file in the source directory 🙁

I'm inclined to add the simple path equality check, with a comment explaining that it's not intended to catch every case where the target is part of the source, just to act as a sanity check for this specific common error. Does that seem like a reasonable compromise?

@pfmoore
Copy link
Member Author

pfmoore commented Feb 24, 2025

Report an error if the target file overwrites a file in the list of source files to add. See #130379 (comment) for details.

# the zipfile module catch writing an archive to itself at a
# lower level, which could help here in cases that our check
# doesn't catch.
if target in files_to_add:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target can be a file-like object. target in files_to_add can invoke comparison of a file-like object with the Path objects. Is it what we need?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's fine. A file-like object will simply test as not equal to all the Path objects. I guess I could add some sort of exception handling in case the user passes an object with a custom equality check that fails, but that seems like overkill to me.

I was going to add a note to that effect in the comment, but it's long enough already and I wasn't sure it was worth it. The fact that you pointed it out suggests that it would be, though.

Lib/zipapp.py Outdated
# thorough checks don't provide enough value to justify the extra
# cost.

# https://github.com/python/cpython/issues/104527 tracks making
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that zipfile should catch this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll remove that part of the comment. I think the same logic applies here - if adding the check can't be justified in zipfile, it's not worth adding it here either. We can stick with the simple but incomplete check.

@pfmoore pfmoore merged commit 64ccbbb into python:main Feb 26, 2025
39 checks passed
@pfmoore
Copy link
Member Author

pfmoore commented Feb 26, 2025

@Yhg1s IMO this is a bug fix that should probably be backported to 3.13. The original bug was introduced in #106076, which isn't present in 3.12, so there's no need to backport further. It's technically a behaviour change (the original intent was to skip the offending file rather than error) but given that the intended behaviour didn't work, I think it's better to backport than leave the bug in 3.13.

Before I add the backport label, do you have any objections?

@pfmoore
Copy link
Member Author

pfmoore commented Mar 3, 2025

I'm going to request a backport. If anyone has any objections, or feels this shouldn't be backported, speak up - it's easy enough to revert if necessary.

@pfmoore pfmoore added the needs backport to 3.13 bugs and security fixes label Mar 3, 2025
@miss-islington-app
Copy link

Thanks @pfmoore for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 3, 2025
…rget in itself (pythongh-130509)

(cherry picked from commit 64ccbbb)

Co-authored-by: Paul Moore <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Mar 3, 2025

GH-130791 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Mar 3, 2025
pfmoore added a commit that referenced this pull request Mar 3, 2025
…arget in itself (gh-130509) (gh-130791)

gh-130379: Fix incorrect zipapp logic to avoid including the target in itself (gh-130509)
(cherry picked from commit 64ccbbb)

Co-authored-by: Paul Moore <[email protected]>
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants