Skip to content

Corrupt .pyc files stay on disk after failed writes #126606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xavierholt opened this issue Nov 8, 2024 · 4 comments
Closed

Corrupt .pyc files stay on disk after failed writes #126606

xavierholt opened this issue Nov 8, 2024 · 4 comments
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir topic-importlib type-bug An unexpected behavior, bug, or error

Comments

@xavierholt
Copy link

xavierholt commented Nov 8, 2024

Bug report

Bug description:

If writing a .pyc file fails (in my case due to a file size limit imposed by ulimit), it can leave corrupt data sitting on disk. This causes a crash the second time you run the program, when the interpreter tries to load the corrupt .pyc file instead of the original .py file:

root@e7138ea2e2b5:/mnt# python3 crashme.py --limit 1024 --import
Setting ulimit to 1024...
Importing a "big" library...
root@e7138ea2e2b5:/mnt# python3 crashme.py --limit 1024 --import
Setting ulimit to 1024...
Importing a "big" library...
Traceback (most recent call last):
  File "/mnt/crashme.py", line 17, in <module>
    from fakelib import bigfile
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 991, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1124, in get_code
  File "<frozen importlib._bootstrap_external>", line 753, in _compile_bytecode
EOFError: marshal data too short

This was a very hard bug to track down, and it seems like there's a chance to catch this and handle it gracefully. If I do a too-large write myself, I get an OSError: [Errno 27] File too large, and I'd imagine the interpreter can see something similar. Ideally (in my opinion - feedback appreciated!), the interpreter would notice this and then:

  • Log a warning.
  • Delete the corrupt file.
  • Set sys.dont_write_bytecode = True (this is the workaround I've been using).

Here's the code I've been using to test this (bigfile can be pretty much anything, but it does seem to have to be part of a module before the interpreter will write a .pyc file for it).

import argparse
import resource
import tempfile

parser = argparse.ArgumentParser()
parser.add_argument('-l', '--limit',  type=int)
parser.add_argument('-i', '--import', action='store_true', dest='impoort')
parser.add_argument('-w', '--write',  type=int)
args = parser.parse_args()

if args.limit is not None:
    print('Setting ulimit to %d...' % args.limit)
    resource.setrlimit(resource.RLIMIT_FSIZE, (args.limit, args.limit))

if args.impoort:
    print('Importing a "big" library...')
    from fakelib import bigfile

if args.write is not None:
    print('Writing a %d byte file...' % args.write)
    with tempfile.TemporaryFile() as file:
        file.write(b'a' * args.write)

Tested in Python 3.10 on OSX and Python 3.12 in an Ubuntu 24 container.

CPython versions tested on:

3.10, 3.12

Operating systems tested on:

Linux, macOS

Linked PRs

@xavierholt xavierholt added the type-bug An unexpected behavior, bug, or error label Nov 8, 2024
@cfbolz
Copy link
Contributor

cfbolz commented Nov 9, 2024

can reproduce this on cpython 3.11, git head, and on pypy (I tried on ubuntu). the problem is likely in importlib somewhere.

@cfbolz
Copy link
Contributor

cfbolz commented Nov 9, 2024

the problem is _write_atomic in _bootstrap_external.py:

def _write_atomic(path, data, mode=0o666):
    """Best-effort function to write data to a path atomically.
    Be prepared to handle a FileExistsError if concurrent writing of the
    temporary file is attempted."""
    import os as _os
    import _io
    # id() is used to generate a pseudo-random filename.
    path_tmp = '{}.{}'.format(path, id(path))
    fd = _os.open(path_tmp,
                  _os.O_EXCL | _os.O_CREAT | _os.O_WRONLY, mode & 0o666)
    try:
        # We first write data to a temporary file, and then use os.replace() to
        # perform an atomic rename.
        with _io.FileIO(fd, 'wb') as file:
            file.write(data)
        _os.replace(path_tmp, path)
    except OSError:
        try:
            _os.unlink(path_tmp)
        except OSError:
            pass
        raise

So we should check the result of the write call, and if the length is not equal to that of data, abort the operation.

the use of FileIO.write causes the issue, since the docs of that say:

write(self, b, /)
Write buffer b to file, return number of bytes written.

Only makes one system call, so not all of the data may be written.
The number of bytes actually written is returned. In non-blocking mode,
returns None if the write would block.

@picnixz picnixz added stdlib Python modules in the Lib dir topic-importlib 3.12 only security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 9, 2024
brettcannon added a commit that referenced this issue Nov 13, 2024
Co-authored-by: Kirill Podoprigora <[email protected]>
Co-authored-by: Brett Cannon <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 13, 2024
(cherry picked from commit c695e37)

Co-authored-by: CF Bolz-Tereick <[email protected]>
Co-authored-by: Kirill Podoprigora <[email protected]>
Co-authored-by: Brett Cannon <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 13, 2024
(cherry picked from commit c695e37)

Co-authored-by: CF Bolz-Tereick <[email protected]>
Co-authored-by: Kirill Podoprigora <[email protected]>
Co-authored-by: Brett Cannon <[email protected]>
brettcannon added a commit that referenced this issue Nov 13, 2024
…6809)

GH-126606: don't write incomplete pyc files (GH-126627)
(cherry picked from commit c695e37)

Co-authored-by: CF Bolz-Tereick <[email protected]>
Co-authored-by: Kirill Podoprigora <[email protected]>
Co-authored-by: Brett Cannon <[email protected]>
brettcannon added a commit that referenced this issue Nov 13, 2024
…6810)

GH-126606: don't write incomplete pyc files (GH-126627)
(cherry picked from commit c695e37)

Co-authored-by: CF Bolz-Tereick <[email protected]>
Co-authored-by: Kirill Podoprigora <[email protected]>
Co-authored-by: Brett Cannon <[email protected]>
@brettcannon
Copy link
Member

Thanks, everyone! This is now fixed in 3.12 to main.

@cfbolz
Copy link
Contributor

cfbolz commented Nov 14, 2024

@brettcannon thanks a lot Brett! will backport to pypy now.

cfbolz added a commit to pypy/pypy that referenced this issue Nov 28, 2024
original commit message:

GH-126606: don't write incomplete pyc files (GH-126627)

Co-authored-by: Kirill Podoprigora <[email protected]>
Co-authored-by: Brett Cannon <[email protected]>
picnixz pushed a commit to picnixz/cpython that referenced this issue Dec 8, 2024
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir topic-importlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants