Skip to content

Zipapp archives are always empty #130379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
swhcz opened this issue Feb 20, 2025 · 11 comments
Closed

Zipapp archives are always empty #130379

swhcz opened this issue Feb 20, 2025 · 11 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@swhcz
Copy link

swhcz commented Feb 20, 2025

Bug report

Bug description:

Good evening,

I have recently installed Python 3.13 (Windows x64) over 3.12.
Now zipapp.create_archive(..) is creating just empty archives.

After some fiddling around I found there might be an issue in these lines:

(from zipapp.py, lines 113 ff)

    with _maybe_open(target, 'wb') as fd:
        _write_file_prefix(fd, interpreter)
        compression = (zipfile.ZIP_DEFLATED if compressed else
                       zipfile.ZIP_STORED)
        with zipfile.ZipFile(fd, 'w', compression=compression) as z:
            for child in sorted(source.rglob('*')):
                arcname = child.relative_to(source)
                if filter is None or filter(arcname) and child.resolve() != arcname.resolve():   # ██ LOOK HERE ! ██
                    z.write(child, arcname.as_posix())
            if main_py:
                z.writestr('__main__.py', main_py.encode('utf-8'))

This change was introduced in Python 3.13 (have verified against 3.12 sources) and seemingly causes the trouble.

My guess is, as child is pointing to the same file as arcname, and naturally, they both resolve to the same file, no file gets added due to the != check, ever, to the archive-to-be-created.

Maybe it was intended to check against target, not arcname?

Once I monkeypatched the and child.resolve() != arcname.resolve() part away, the archive got written again.

Or am I just not doing it right?
Any help much appreciated!

Sebastian.

CPython versions tested on:

3.13

Operating systems tested on:

Windows

Linked PRs

@swhcz swhcz added the type-bug An unexpected behavior, bug, or error label Feb 20, 2025
@swhcz
Copy link
Author

swhcz commented Feb 20, 2025

maybe related to #104527 ?

@swhcz swhcz changed the title Zipapp creates empty archives Zipapp archives are always empty Feb 20, 2025
@encukou encukou added the stdlib Python modules in the Lib dir label Feb 21, 2025
@akx
Copy link
Contributor

akx commented Feb 24, 2025

Yeah, just noticed this too (after it caused an ongoing inadvertent production oopsie).

Reproducer: https://github.com/akx/cpython-zipapp-bug

I think #106076 was the PR breaking this.

@akx
Copy link
Contributor

akx commented Feb 24, 2025

@swhcz Can you show your usage of create_archive()? Is it similar to mine in https://github.com/akx/cpython-zipapp-bug/blob/0c5acb7705766cc5651b3e4d073e376746e9ffb4/makepkg.py#L14-L21 ?

@pfmoore
Copy link
Member

pfmoore commented Feb 24, 2025

Thanks for the reproducer. This bug wasn't caught by the existing test suite, unfortunately.

The specific condition needed for this to fail are:

  1. There is a filter argument specified.
  2. The source argument is the current directory.

I agree that the change in #106076 is incorrect here. But IMO, the logic behind that change was what was flawed - rather than checking if we're trying to add the target to itself, we should collect the list of files to add before
we create the target, avoiding the problem in the first place.

Please can you try the following patch to confirm if it fixes the issue for you?

diff --git a/Lib/zipapp.py b/Lib/zipapp.py
index 03a214efa10..7803e0a9021 100644
--- a/Lib/zipapp.py
+++ b/Lib/zipapp.py
@@ -131,14 +131,18 @@ def create_archive(source, target=None, interpreter=None, main=None,
     elif not hasattr(target, 'write'):
         target = pathlib.Path(target)

+    # Create the list of files to add to the archive now, in case
+    # the target is being created in the source directory - we
+    # don't want the target being added to itself
+    files_to_add = sorted(source.rglob('*'))
     with _maybe_open(target, 'wb') as fd:
         _write_file_prefix(fd, interpreter)
         compression = (zipfile.ZIP_DEFLATED if compressed else
                        zipfile.ZIP_STORED)
         with zipfile.ZipFile(fd, 'w', compression=compression) as z:
-            for child in sorted(source.rglob('*')):
+            for child in files_to_add:
                 arcname = child.relative_to(source)
-                if filter is None or filter(arcname) and child.resolve() != arcname.resolve():
+                if filter is None or filter(arcname):
                     z.write(child, arcname.as_posix())
             if main_py:
                 z.writestr('__main__.py', main_py.encode('utf-8'))

@akx
Copy link
Contributor

akx commented Feb 24, 2025

@pfmoore 👍 That fixes my reproducer's assertion (because it removes the child.resolve() != arcname.resolve() clause). However, in case the destination file exists, it will be added to the archive (unless filtered out by filter):

$ uv run --python=3.13 makepkg.py
['__main__.py', 'dist/', 'pkg/', 'pkg/__init__.py', 'pkg/foo.py']
$ uv run --python=3.13 makepkg.py && zipinfo dist/out-13
['__main__.py', 'dist/', 'dist/out-13', 'pkg/', 'pkg/__init__.py', 'pkg/foo.py']

However, I'm not sure if that issue ("don't self-zip", #104527 as referred to in #106076) in scope for this bug ("empty archives"). On that note, this also fixes one case of the other bug mentioned as a footnote in my reproducer's readme. Namely, one can now create an archive in the same source directory once, but rerunning the same command has zipapp hang forever compressing out into out...

$ uv run --python=3.13 -m zipapp -o out --main pkg.foo:hello .
(works, creates a zipapp with a whole bunch of files, as there is no filter)
$ uv run --python=3.13 -m zipapp -o out --main pkg.foo:hello .
(crunchy IO noises, hangs until KeyboardInterrupt)
^C
  File "zipfile/__init__.py", line 1242, in close
    raise RuntimeError("File size too large, try using force_zip64")
$ du -hs out
5.6G    out
$ rm out

@pfmoore
Copy link
Member

pfmoore commented Feb 24, 2025

IMO, that's correct behaviour. You have given zipapp.create_archive a source directory containing a file called out, and you didn't specify a filter that excluded that file. So in that case, zipapp should include the file in the archive.

#104527 would improve the behaviour in this area, by giving an error message rather than simply creating an ever-growing file, but that's an issue for the zipfile module, not for zipapp. I'd also be willing to accept a PR adding some form of --filter argument to the zipapp CLI, although the details of how that would work would need to be sorted out.

Generally, though, I would strongly advise not creating the target in the source directory.

@akx
Copy link
Contributor

akx commented Feb 24, 2025

Yeah, not disagreeing.

Maybe the CLI could be made smart enough to default to a filter that excludes the target file if it inadvertently is among the sources, but if you use the library by hand, you need to do that yourself?

@pfmoore
Copy link
Member

pfmoore commented Feb 24, 2025

I'm considering adding a check in the function to exclude target, but only based on a pathname check (so it will catch simple mistakes like the one you demonstrated, but it can be defeated by things like symlinks or .. segments in a path1). I'm not going to do anything with the CLI - that's meant to be simple, for the cases where the user assembles their app in a app directory and does py -m zipapp app (to create app.pyz from app/*). Using . as the source is an antipattern, especially in the CLI, and I'd rather see people avoid using it than try to make it easier.

Footnotes

  1. I'm not going to go as far as using Path.samefile, or resolving names - if someone needs this, they are better writing their own filter function and handling all the edge cases themself.

@swhcz
Copy link
Author

swhcz commented Feb 24, 2025

Hi everyone,

many thanks for looking into this!!

@akx, yes, that's pretty much how my code looks. I have a small project that is packaged (in the same directory) to be copied to a RasPi as a .pyz-Package.

While fiddling with the code I figured, leaving out the filter makes the whole packaging go bonkers, it fails with a file-too-large-exception when hitting the 4GB-mark, which can only mean using '.' as source causes something else to break, too, as there's not even remotely that many files in and below the current dir, meaning that large data comes from either a recursion or from outside.

While I may agree creating a package in sourcedir could be a bad idea, it is not explicitly outruled, and using a filter (mine is actually checking for the package name, and returning false :) has worked just fine. Maybe the above behaviour is caused by using the source dir as target, haven't tested that.

Here's a smaller version of my code for reference:

#!python3
# coding: utf-8
"""
Creates a python executable module for the RasPi to use.

Output is 'package_for_raspi.pyz', which can be copied elsewhere.
Call: 'python3 <path_to>/package_for_raspi.pyz
"""
import os
import zipapp
from pathlib import Path

ARCHIVE_FNAME = 'package_for_raspi.pyz'

NO_COPY_PREFIX = ['_', '.', 'test\\', 'pack_for_pi.py', '_repl']
NO_COPY_CONTAINS = ['__pycache__', '.gitignore']
NO_COPY_SUFFIX = ['.lnk']


def get_my_path() -> str:
    try:
        return os.path.dirname(os.path.realpath(__file__))
    except NameError:
        # script has no name, usually only happens if started from the IDE
        return os.getcwd()


def pack_to_python_archive(archive):
    our_current_dir = get_my_path()

    def my_filter(p:Path) -> bool:
        """Return False for development and unnecessary runtime files"""
        path_str = str(p)

        filename_check = ([path_str == ARCHIVE_FNAME] +
                          [path_str.startswith(s) for s in NO_COPY_PREFIX] +
                          [path_str.endswith(s) for s in NO_COPY_SUFFIX] +
                          [s in path_str for s in NO_COPY_CONTAINS]
                          )

        if any(filename_check):
            return False

        assert os.path.exists(f := os.path.join(our_current_dir, path_str)), f
        return True

    zipapp.create_archive(source=our_current_dir,
                          target=os.path.join(our_current_dir, archive),
                          filter=my_filter,
                          main='main:main')


if __name__ == "__main__":
    pack_to_python_archive(archive=ARCHIVE_FNAME)

@pfmoore
Copy link
Member

pfmoore commented Feb 24, 2025

I've been thinking further about this. Now that we have a fix so that zipapp decides what files to include in the archive before it starts writing the file, the only way it can be writing to itself is if (an older version of) the output archive is part of the list of input files as specified. That's always a user error - there's no realistic reason for wanting a zipapp to contain an obsolete version of itself.

Therefore, I'm now of the opinion that we shouldn't automatically ignore the target file. Instead, we should do the best we can to provide an informative error if the user makes this mistake. So I plan on doing a simple check - is the target file present in the list of source files (just a simple path equality check, no attempt to resolve symlinks or anything like that)? If so, then fail with an error explaining the issue. That will catch the common case, py -m zipapp . --target some_file_in_the_cwd.pyz. We'll leave more complex cases to the zipfile module, which has an open issue for this at #104527.

Anyone who wants to write the output archive to the directory containing the sources can still do so, but they will need to provide a filter argument to explicitly skip the output file (and they won't be able to use the basic CLI, as that doesn't support filters). IMO, that's a fair expectation for a situation that we'd advise against.

I will update the PR with the fix to implement this approach.

@pfmoore
Copy link
Member

pfmoore commented Feb 26, 2025

Fixed by #130509

@pfmoore pfmoore closed this as completed Feb 26, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Mar 3, 2025
…rget in itself (pythongh-130509)

(cherry picked from commit 64ccbbb)

Co-authored-by: Paul Moore <[email protected]>
pfmoore added a commit that referenced this issue Mar 3, 2025
…arget in itself (gh-130509) (gh-130791)

gh-130379: Fix incorrect zipapp logic to avoid including the target in itself (gh-130509)
(cherry picked from commit 64ccbbb)

Co-authored-by: Paul Moore <[email protected]>
seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

4 participants