Skip to content

test_interpreters: test_create_many_threaded() failed on FreeBSD: log: RuntimeError: interpreter creation failed #109700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vstinner opened this issue Sep 21, 2023 · 19 comments
Labels
tests Tests in the Lib/test dir topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Sep 21, 2023

test_create_many_threaded (test.test_interpreters.StressTests.test_create_many_threaded) ...

Warning -- Uncaught thread exception: RuntimeError
Exception in thread Thread-83 (task):
RuntimeError: error evaluating path

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/buildbot/buildarea/3.x.ware-freebsd/build/Lib/threading.py", line 1059, in _bootstrap_inner
    self.run()
  File "/buildbot/buildarea/3.x.ware-freebsd/build/Lib/threading.py", line 996, in run
    self._target(*self._args, **self._kwargs)
  File "/buildbot/buildarea/3.x.ware-freebsd/build/Lib/test/test_interpreters.py", line 483, in task
    interp = interpreters.create()
             ^^^^^^^^^^^^^^^^^^^^^
  File "/buildbot/buildarea/3.x.ware-freebsd/build/Lib/test/support/interpreters.py", line 25, in create
    id = _interpreters.create(isolated=isolated)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: interpreter creation failed

ok

build: https://buildbot.python.org/all/#/builders/1223/builds/187

Linked PRs

@vstinner vstinner added the tests Tests in the Lib/test dir label Sep 21, 2023
@vstinner
Copy link
Member Author

Bug also seen on aarch64 RHEL8 LTO + PGO 3.x: https://buildbot.python.org/all/#/builders/78/builds/5402

@serhiy-storchaka
Copy link
Member

Important part in output:

0:05:34 load avg: 1.66 [424/463/1] test_interpreters failed (env changed) -- running (1): test_threading (2 min 42 sec)
Exception ignored error evaluating path:
Traceback (most recent call last):
  File "<frozen getpath>", line 356, in <module>
ValueError: embedded null byte

@vstinner
Copy link
Member Author

ValueError: embedded null byte

It comes from the FreeBSD build.


The aarch64 RHEL8 LTO + PGO 3.x build has a different error: TypeError: descriptor 'close' for '_io.BufferedReader' objects doesn't apply to a '_io.FileIO' object.

Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/python'
  isolated = 0
  environment = 0
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 1
  stdlib dir = '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/Lib'
  sys._base_executable = '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/python'
  sys.base_prefix = '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/target'
  sys.base_exec_prefix = '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/target'
  sys.platlibdir = 'lib'
  sys.executable = '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/python'
  sys.prefix = '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/target'
  sys.exec_prefix = '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/target'
  sys.path = [
    '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/target/lib/python313.zip',
    '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/Lib',
    '/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/build/lib.linux-aarch64-3.13',
  ]
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1004, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1100, in get_code
  File "<frozen importlib._bootstrap_external>", line 1199, in get_data
TypeError: descriptor 'close' for '_io.BufferedReader' objects doesn't apply to a '_io.FileIO' object

@vstinner
Copy link
Member Author

I failed to reproduce the issue on Linux just with this command:

./python -m test test_interpreters -v --forever -j25 --fail-env-changed

@serhiy-storchaka
Copy link
Member

'/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/build/lib.linux-aarch64-3.13',

This looks very suspicious. Why two /build/s in a row?

@vstinner
Copy link
Member Author

This looks very suspicious. Why two /build/s in a row?

That's how aarch64 RHEL8 LTO + PGO 3.x is configured.

  • Python source code: /home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build
  • Main test process run in: /home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64.lto-pgo/build/build/test_python_worker_2806656æ

Yes, there are two build/build/ sub-directories, but I just think that it's a choice of the buildbot configuration, not a Python bug.

@serhiy-storchaka
Copy link
Member

It may be related to #109615. What if run tests locally in such configuration?

@serhiy-storchaka
Copy link
Member

This test consumes a lot of memory. 200 threads need more than 6 GB of memory if run them simultaneous (and if not, then what is the point of using so many threads?). Perhaps there is a leak, because if run tests repeatedly with limited memory, they finally crash.

$ (ulimit -v 7000000; ./python -m test -vuall test_interpreters -m test_create_many_threaded --forever)
...
0:00:11 load avg: 37.22 [  5] test_interpreters
test_create_many_threaded (test.test_interpreters.StressTests.test_create_many_threaded) ... MemoryErrorMemoryErrorMemoryErrorFatal Python error: Segmentation fault

Current thread 0x00007fc450032640 (most recent call first):
  <no Python frame>

200 live interpreters also consume sufficient amount of memory.

All normal tests only need 600-700 MB of memory. Tests which need more are marked with @bigmemtest decorator and perform a dry run or skipped by default. It seems that not all buildbots have so much memory.

@ericsnowcurrently What is the purpose of this test? Can it use less threads, for example 10 threads, each sequentially creating 20 interpreters? Can 5 threads be enough?

@serhiy-storchaka
Copy link
Member

test_create_many_sequential also crashes due to leaks.

$ (ulimit -v 700000; ./python -m test -vuall test_interpreters -m test_create_many_sequential --forever --fail-env-changed)
...
0:00:02 load avg: 0.62 [  2] test_interpreters
test_create_many_sequential (test.test_interpreters.StressTests.test_create_many_sequential) ... Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/site.py", line 73, in <module>
  File "/home/serhiy/py/cpython/Lib/os.py", line 29, in <module>
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1004, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1137, in get_code
  File "<frozen importlib._bootstrap_external>", line 766, in _compile_bytecode
Fatal Python error: Segmentation fault

Current thread 0x00007f60db746740 (most recent call first):
  <no Python frame>

If you double the limit, it will crash on the 7th iteration instead of the 2nd. If triple -- on 11th. So it leaks 1.5-2 MB per interpreter. These two tests ran sequentially can leak up to 6 GB of memory.

@vstinner
Copy link
Member Author

Wow, so now it's possible to leak a whole interpreter?

Maybe we need something like threading_setup() / threading_cleanup() which uses _thread._count(), to count how many interpreters we have before/after running tests?

@serhiy-storchaka
Copy link
Member

These two tests ran sequentially can leak up to 6 GB of memory.

Actually, only up to 600 MB, but it is much anyway. I do not know how it is now, but several years ago some of buildbots had only few hundreds of MBs of physical memory, often failed after hours of swapping due to timeout. I suppose buildbots on which test_create_many_threaded fails also have very limited RAM.

@serhiy-storchaka
Copy link
Member

More precisely, 1.4 MB are leaked in every subinterpreter.

@vstinner
Copy link
Member Author

More precisely, 1.4 MB are leaked in every subinterpreter.

Almost a floppy disk (1.44 MB)!

@ericsnowcurrently
Copy link
Member

Sorry I didn't see this sooner. I don't know why, but the GitHub notifications page isn't showing me this issue. Thankfully @serhiy-storchaka DM'ed me.

if run them simultaneous (and if not, then what is the point of using so many threads?)
...
What is the purpose of this test? Can it use less threads, for example 10 threads, each sequentially creating 20 interpreters? Can 5 threads be enough?

The purpose of the test is to make sure we don't crash if we create a bunch of subinterpreters and keep them alive at the same time. We try it both one-at-a-time and in parallel (though the GIL keeps it mostly one-at-a-time for now). The number of interpreters (hence the number of threads) needs to be high enough that we can consistently trigger possible races in interpreter initialization and other resource contention.

FWIW, the stress tests were inspired by some code @tonybaloney shared with me earlier this year and I was using to hunt down crashes:

(expand)
import time
import _xxsubinterpreters as _interpreters
from threading import Thread
from queue import Queue
import os


def run(host: str, port: int, results: Queue):
    # Create a communication channel
    r, w = os.pipe()
    interpid = _interpreters.create()
    subinterpreters.run_string(interpid, f"""if True:
        import os
        import socket
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(1)
        result = sock.connect_ex(({host!r}, {port}))
        os.write({w}, result.to_bytes())
        sock.close()
        """)
    print("completed", flush=True)
    output = os.read(r, 10)
    if int.from_bytes(output) == 0:
        results.put(port)


if __name__ == '__main__':
    start = time.time()
    host = "localhost"  # pick a friend
    threads = []
    results = Queue()
    for port in range(1, 100):
#    for port in range(80, 100):
        t = Thread(target=run, args=(host, port, results))
        t.start()
        threads.append(t)
#        t.join()
    for t in threads:
        t.join()
    while not results.empty():
        print("Port {0} is open".format(results.get()))
    print("Completed scan in {0} seconds".format(time.time() - start))

When I started, I got semi-reliable crashes if I bumped the number of threads from 20 to 100. So that's what I did in the test. I don't recall why I bumped it up to 200 for the threaded version. IIRC, I toyed with scaling that down if the host had limited RAM, but decided against the extra complexity. Maybe I should revisit that.

Do we need that many interpreters/threads in the stress tests? Maybe not. I just want to be sure we give ourselves the best chance to find crashes.

While we figure that out, at the least we could apply the @bigmemtest decorator to the stress tests.

Perhaps there is a leak, because if run tests repeatedly with limited memory, they finally crash.

That would definitely be worth looking into.

So it leaks 1.5-2 MB per interpreter.
...
More precisely, 1.4 MB are leaked in every subinterpreter.

AFAICS, each interpreter uses a bit over 2 MB. Not all of that is necessarily resident. When an interpreter is finalized, the system allocator determines how much of that process memory to actually release to the system. That makes it harder to determine what leaks there may be. That said, it certainly seems like something's leaking.

@vstinner
Copy link
Member Author

vstinner commented Nov 8, 2023

@ericsnowcurrently: What's the status of this issue? Should it be closed?

@ericsnowcurrently
Copy link
Member

I still need to investigate this further. It's certainly still a problem. Sorry it's taking so long.

@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2024

I didn't see this error recently so I close the issue.

@vstinner vstinner closed this as not planned Won't fix, can't repro, duplicate, stale Sep 6, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Subinterpreters Sep 6, 2024
serhiy-storchaka added a commit that referenced this issue May 4, 2025
* Ensure that destructors are called in the test that created interpreters, not after finishing it.
* Try to create/run interpreters in threads simultaneously.
* Mark tests that requires over 6GB of memory with bigmemtest.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 4, 2025
…nGH-109946)

* Ensure that destructors are called in the test that created interpreters, not after finishing it.
* Try to create/run interpreters in threads simultaneously.
* Mark tests that requires over 6GB of memory with bigmemtest.
(cherry picked from commit 61b50a9)

Co-authored-by: Serhiy Storchaka <[email protected]>
serhiy-storchaka added a commit that referenced this issue May 4, 2025
…09946) (GH-133391)

* Ensure that destructors are called in the test that created interpreters, not after finishing it.
* Try to create/run interpreters in threads simultaneously.
* Mark tests that requires over 6GB of memory with bigmemtest.
(cherry picked from commit 61b50a9)

Co-authored-by: Serhiy Storchaka <[email protected]>
diegorusso added a commit to diegorusso/cpython that referenced this issue May 4, 2025
* origin/main: (111 commits)
  pythongh-91048: Add filename and line number to external inspection routines (pythonGH-133385)
  pythongh-131178: Add tests for `ast` command-line interface (python#133329)
  Regenerate pcbuild.sln in Visual Studio 2022 (python#133394)
  pythongh-133042: disable HACL* HMAC on Emscripten (python#133064)
  pythongh-133351: Fix remote PDB's multi-line block tab completion (python#133387)
  pythongh-109700: Improve stress tests for interpreter creation (pythonGH-109946)
  pythongh-81793: Skip tests for os.link() to symlink on Android (pythonGH-133388)
  pythongh-126835: Rename `ast_opt.c` to `ast_preprocess.c` and related stuff after moving const folding to the peephole optimizier (python#131830)
  pythongh-91048: Relax test_async_global_awaited_by to fix flakyness (python#133368)
  pythongh-132457: make staticmethod and classmethod generic (python#132460)
  pythongh-132805: annotationlib: Fix handling of non-constant values in FORWARDREF (python#132812)
  pythongh-132426: Add get_annotate_from_class_namespace replacing get_annotate_function (python#132490)
  pythongh-81793: Always call linkat() from os.link(), if available (pythonGH-132517)
  pythongh-122559: Synchronize C and Python implementation of the io module about pickling (pythonGH-122628)
  pythongh-69605: Add PyREPL import autocomplete feature to 'What's New' (python#133358)
  bpo-44172: Keep reference to original window in curses subwindow objects (pythonGH-26226)
  pythonGH-133231: Changes to executor management to support proposed `sys._jit` module (pythonGH-133287)
  pythongh-133363: Fix Cmd completion for lines beginning with `! ` (python#133364)
  pythongh-132983: Introduce `_zstd` bindings module (pythonGH-133027)
  pythonGH-91048: Add utils for printing the call stack for asyncio tasks (python#133284)
  ...
@serhiy-storchaka
Copy link
Member

There were many test_interpreters failures on some buildbots recently, there were also failures related to high memory consumption, so I applied my solution.

@serhiy-storchaka
Copy link
Member

But the leak in interpreters has not been fixed.

$ (ulimit -v 300000; ./python -u -c '
from test.support import interpreters
for i in range(1000):
    print(i)
    interpreters.create()
')
...
250
251
RuntimeError: Failed to import encodings module

During handling of the above exception, another exception occurred:

interpreters.InterpreterError: sub-interpreter creation failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 5, in <module>
    interpreters.create()
    ~~~~~~~~~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/test/support/interpreters/__init__.py", line 76, in create
    id = _interpreters.create(reqrefs=True)
interpreters.InterpreterError: interpreter creation failed
python: Objects/typeobject.c:297: managed_static_type_state_clear: Assertion `!_PyRuntime.types.managed_static.types[full_index].interp_count' failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir topic-subinterpreters type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

3 participants