Skip to content

zipapp fails cryptically on large ZIP64-formatted archives because zipimport.py doesn't support ZIP64. #95706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thundergolfer opened this issue Aug 5, 2022 · 4 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@thundergolfer
Copy link

thundergolfer commented Aug 5, 2022

Bug report

When using Zipapp (and other related projects) to create standalone Python applications my colleagues and I ran into trouble using any large but valid .zip archive.

Turns out Lib/zipimport.py doesn't support ZIP64, and 'gets lost' when reading ZIP64 archives causing it to find 0 files in these archives and report that a __main__.py cannot be found.

Minimal reproduction

Create a Zip64 archive file using a simple __main__.py and random binary data to pad out the size.

"""
Use the `zipapp` module to write a Zip64 archive to disk.
(Alternatively the `zipfile` module can be used directly.)
"""
import os
import pathlib
import tempfile
import zipapp
import zipfile

def main() -> int:
    num_dummy_files = 10
    dummy_file_size = int((1.5 * zipfile.ZIP64_LIMIT) // num_dummy_files)
    temp_dir = tempfile.TemporaryDirectory()
    for i in range(num_dummy_files):
        with open(pathlib.Path(temp_dir.name, f"{i}.bin"), "wb") as dummy_f:
            dummy_f.write(os.urandom(dummy_file_size))
    with open(pathlib.Path(temp_dir.name, "__main__.py"), "w") as main_f:
        main_f.write("print('Hello from the zipapp __main__py!')")

    zipapp.create_archive(temp_dir.name, "zip64_sized.pyz")
    temp_dir.cleanup()
    return 0

if __name__ == "__main__":
    raise SystemExit(main())

Attempt to execute the large zipapp.

python3.11 zip64_sized.pyz
/workspaces/cpython/python: can't find '__main__' module in '/workspaces/cpython/zip64_sized.pyz'
# or, using interpreters compiled from latest `main` (698fa8bf)
./python zip64_size.pyz
/usr/local/bin/python3.11: can't find '__main__' module in '/workspaces/cpython/zip64_sized.pyz'

The __main__ module is of course present in the archive, which prompts head scratching until you did into the cPython source and ZIP file spec.

How to fix

The zipapp module will happily produce Zip64 archives because the underlying zipfile module has defaulted Zip64 support since Python 3.4.

The 'full' fix for this issue would be to refactor Lib/zipimport.py to support Zip64 loading.

A first fix I think could be just providing a clearer error message when Lib/zipimport.py is given a Zip64 archive.

I'm happy to provide patches for each of these fixes in turn, if there's support for it. :)

Edit: Began attempting to raise an exception on Zip64 archives, but it seems on raising an exception within zipimport.py the program doesn't exit and instead continues to start the interpreter:

Traceback (most recent call last):
  File "<frozen zipimport>", line 91, in __init__
ValueError: ZIP64 archives are unsupported
SyntaxError: Non-UTF-8 code starting with '\xff' in file /workspaces/cpython/zip64_sized.pyz on line 2, but no encoding declared; see https://peps.python.org/pep-0263/ for details

Your environment

- CPython versions tested on:

  • Python 3.12.0a0 (heads/main:698fa8bf60, Aug 5 2022, 08:59:06) [GCC 9.4.0] on linux
  • Python 3.11.0b5+ (heads/3.11:8570f6d1a0, Aug 2 2022, 07:52:11) [GCC 9.4.0] on linux
  • Python 3.10.4 (main, Apr 1 2022, 20:52:12) [GCC 9.4.0] on linux

- Operating system and architecture:

uname -a
Linux codespaces-5a1930 5.4.0-1086-azure #91~18.04.1-Ubuntu SMP Thu Jun 23 20:33:05 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Related:

@thundergolfer thundergolfer added the type-bug An unexpected behavior, bug, or error label Aug 5, 2022
@thundergolfer
Copy link
Author

Perhaps there should be an option in zipapp.create_archive that disables creation of ZIP64 archives, as these are not to not be consumable by current Python interpreters. This wouldn't protect archive authors who use other tools to create archives, but it's something.

@ronaldoussoren
Copy link
Contributor

See also #89739

@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 24, 2023
@danifus
Copy link
Contributor

danifus commented Jul 12, 2024

Should this issue be closed now #89739 is fixed by #94146 ?

@itamaro
Copy link
Contributor

itamaro commented Jul 12, 2024

I confirmed the repro works with 3.13+:

$ ./python.exe zip64_sized.pyz
Hello from the zipapp __main__py!

closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

5 participants