Skip to content

test_frame does crash randomly on Linux (x86-64, ppc64le, s390x) #133261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vstinner opened this issue May 1, 2025 · 21 comments
Closed

test_frame does crash randomly on Linux (x86-64, ppc64le, s390x) #133261

vstinner opened this issue May 1, 2025 · 21 comments
Labels
release-blocker tests Tests in the Lib/test dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@vstinner
Copy link
Member

vstinner commented May 1, 2025

Crash report

Example: https://buildbot.python.org/#/builders/64/builds/9301

test_repr_deep (test.test_frame.FrameLocalsProxyMappingTests.test_repr_deep) ...

Fatal Python error: Segmentation fault

Current thread 0x00007f2ed21d5400 [python] (most recent call first):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Lib/unittest/case.py", line 247 in handle
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Lib/unittest/case.py", line 813 in assertRaises
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Lib/test/mapping_tests.py", line 634 in test_repr_deep
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Lib/unittest/case.py", line 615 in _callTestMethod
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Lib/unittest/case.py", line 669 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Lib/unittest/case.py", line 725 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Lib/unittest/suite.py", line 122 in run
  ...

Linked PRs

@vstinner vstinner added tests Tests in the Lib/test dir type-crash A hard crash of the interpreter, possibly with a core dump labels May 1, 2025
@vstinner
Copy link
Member Author

vstinner commented May 1, 2025

The crash might be related to the recent trashcan change: 44e4c47

@vstinner
Copy link
Member Author

vstinner commented May 1, 2025

Extract of the gdb backtrace:

#0  _PyObject_GC_UNTRACK (op=0x7fffdbc073d0) at ./Include/internal/pycore_gc.h:279
#1  listiter_dealloc (self=0x7fffdbc073d0) at Objects/listobject.c:3997
#2  0x00000000004f1a2e in _PyTrash_thread_destroy_chain (tstate=<optimized out>) at Objects/object.c:2957
#3  _Py_Dealloc (op=op@entry=0x7fffdbbf7f40) at Objects/object.c:3073
#4  0x00000000004aeda8 in Py_DECREF (op=0x7fffdbbf7f40) at ./Include/refcount.h:433
#5  framelocalsproxy_repr (self=0x7fffdefaf760) at Objects/frameobject.c:532
#6  framelocalsproxy_repr (self=0x7fffdefaf760) at Objects/frameobject.c:512
#7  0x00000000004f1f4a in PyObject_Repr (v=0x7fffdefaf760) at ./Include/object.h:270
...
#310188 0x00000000004f1f4a in PyObject_Repr (v=0x7fffdc742b40) at ./Include/object.h:270
#310189 PyObject_Repr (v=v@entry=0x7fffdc742b40) at Objects/object.c:753
#310190 0x00000000004aed74 in framelocalsproxy_repr (self=0x7fffdc72cf10) at Objects/frameobject.c:531
#310191 framelocalsproxy_repr (self=0x7fffdc72cf10) at Objects/frameobject.c:512
#310192 0x00000000004f1f4a in PyObject_Repr (v=0x7fffdc72cf10) at ./Include/object.h:270
#310193 PyObject_Repr (v=0x7fffdc72cf10) at Objects/object.c:753
#310194 0x0000000000564e2b in PyUnicodeWriter_WriteRepr (writer=writer@entry=0x7fffdc742af0, obj=<optimized out>) at Objects/unicodeobject.c:13969
#310195 0x00000000004d7bd7 in dict_repr_lock_held (self=0x7fffdc742ac0) at Objects/dictobject.c:3372
#310196 dict_repr (self=0x7fffdc742ac0) at Objects/dictobject.c:3401
#310197 0x00000000004f1f4a in PyObject_Repr (v=0x7fffdc742ac0) at ./Include/object.h:270
#310198 PyObject_Repr (v=v@entry=0x7fffdc742ac0) at Objects/object.c:753
#310199 0x00000000004aed74 in framelocalsproxy_repr (self=0x7fffdc72cf40) at Objects/frameobject.c:531
#310200 framelocalsproxy_repr (self=0x7fffdc72cf40) at Objects/frameobject.c:512
#310201 0x00000000004f1f4a in PyObject_Repr (v=0x7fffdc72cf40) at ./Include/object.h:270
...
#310486 0x000000000067bb72 in Py_RunMain () at Modules/main.c:767

vstinner added a commit to vstinner/cpython that referenced this issue May 1, 2025
GCC 9 and older don't have __has_builtin(), but have
__builtin_frame_address() function.
@ambv
Copy link
Contributor

ambv commented May 1, 2025

Also happened randomly on my Windows 11 buildbot:
https://buildbot.python.org/#/builders/1088/builds/722/steps/5/logs/stdio

@vstinner
Copy link
Member Author

vstinner commented May 1, 2025

Also happened randomly on my Windows 11 buildbot:
https://buildbot.python.org/#/builders/1088/builds/722/steps/5/logs/stdio

Oh strange. This bug looks more like #133199 : assertion failure in _PyObject_GC_UNTRACK().

test_repr_deep (test.test_frame.FrameLocalsProxyMappingTests.test_repr_deep) ...

R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Objects\listobject.c:3997: _PyObject_GC_UNTRACK: Assertion "_PyObject_GC_IS_TRACKED(((PyObject*)(op)))" failed: object not tracked by the garbage collector
Enable tracemalloc to get the memory block allocation traceback
object address  : 0000016FA81B1090
object refcount : 0
object type     : 00007FFB97E7EEB0
object type name: list_iterator
object repr     : <refcnt 0 at 0000016FA81B1090>
Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: initialized
Warning -- Unraisable exception
Exception ignored in the internal traceback machinery:
Traceback (most recent call last):
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\traceback.py", line 143, in _print_exception_bltin
    return print_exception(exc, limit=BUILTIN_EXCEPTION_LIMIT, file=file, colorize=colorize)
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\traceback.py", line 134, in print_exception
    te.print(file=file, chain=chain, colorize=colorize)
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\traceback.py", line 1577, in print
    for line in self.format(chain=chain, colorize=colorize):
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\traceback.py", line 1514, in format
    yield from _ctx.emit(exc.format_exception_only(colorize=colorize))
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\traceback.py", line 1001, in emit
    for text in text_gen:
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\traceback.py", line 1268, in format_exception_only
    isinstance(self.__notes__, collections.abc.Sequence)
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\abc.py", line 119, in __instancecheck__
    return _abc_instancecheck(cls, instance)
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\abc.py", line 123, in __subclasscheck__
    return _abc_subclasscheck(cls, subclass)
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\abc.py", line 123, in __subclasscheck__
    return _abc_subclasscheck(cls, subclass)
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\abc.py", line 123, in __subclasscheck__
    return _abc_subclasscheck(cls, subclass)
RecursionError: Stack overflow (used 11719 kB)
Current thread 0x00000408 (most recent call first):
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\case.py", line 247 in handle
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\case.py", line 813 in assertRaises
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\mapping_tests.py", line 634 in test_repr_deep
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\case.py", line 615 in _callTestMethod
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\case.py", line 669 in run
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\case.py", line 725 in __call__
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\suite.py", line 122 in run
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\suite.py", line 84 in __call__
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\suite.py", line 122 in run
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\suite.py", line 84 in __call__
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\unittest\runner.py", line 259 in run
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\single.py", line 84 in _run_suite
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\single.py", line 42 in run_unittest
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\single.py", line 162 in test_func
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\single.py", line 118 in regrtest_runner
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\single.py", line 165 in _load_run_test
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\single.py", line 210 in _runtest_env_changed_exc
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\single.py", line 319 in _runtest
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\single.py", line 348 in run_single_test
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\worker.py", line 92 in worker_process
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\worker.py", line 127 in main
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\test\libregrtest\worker.py", line 131 in <module>
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\runpy.py", line 88 in _run_code
  File "R:\buildarea\pull_request.ambv-bb-win11.bigmem\build\Lib\runpy.py", line 198 in _run_module_as_main
Extension modules: _testinternalcapi, _testcapi (total: 2)

vstinner added a commit that referenced this issue May 1, 2025
GCC 9 and older don't have __has_builtin(), but have
__builtin_frame_address() function.
@vstinner
Copy link
Member Author

vstinner commented May 1, 2025

@ambv: Would you mind to open a separated issue for the Windows failure? It's a different root cause related to _PyObject_GC_UNTRACK().

@encukou
Copy link
Member

encukou commented May 2, 2025

Since the PR was merged, the s390x RHEL8 buildbot started failing: https://buildbot.python.org/#/builders/442/builds/9249

@StanFromIreland
Copy link
Contributor

StanFromIreland commented May 4, 2025

Can also reproduce (~25% of the time) on my machine, x86-64 Fedora 42.

@vstinner
Copy link
Member Author

vstinner commented May 4, 2025

The crash or the GC untrack assertion error?

@StanFromIreland
Copy link
Contributor

StanFromIreland commented May 4, 2025

The assertion error, apologies, I should have specified.

@vstinner
Copy link
Member Author

vstinner commented May 4, 2025

Can you please open a new issue for the assertion error?

@vstinner
Copy link
Member Author

vstinner commented May 4, 2025

I can reproduce the crash ("Fatal Python error: Segmentation fault") on x86-64 Fedora 42:

make clean
./configure
make
./python -m test -v test_frame  # run multiple times until you get a crash

Output:

test_repr_deep (test.test_frame.FrameLocalsProxyMappingTests.test_repr_deep) ...

Fatal Python error: Segmentation fault
(...)

@vstinner
Copy link
Member Author

vstinner commented May 4, 2025

@markshannon: I don't understand why _PyTrash_thread_destroy_chain() is called by _Py_Dealloc() whereas the call stack is very deep (150 000 Python frames / 300 000 C frames). Is it a bug in _Py_RecursionLimit_GetMargin()? Or is the limit badly configured?

vstinner added a commit to vstinner/cpython that referenced this issue May 4, 2025
Fix a random crash in test_frame.test_repr_deep() on x86-64.
vstinner added a commit to vstinner/cpython that referenced this issue May 4, 2025
Fix a random crash in test_frame.test_repr_deep() on x86-64.
vstinner added a commit to vstinner/cpython that referenced this issue May 4, 2025
Fix a random crash in test_frame.test_repr_deep() on x86-64.
@vstinner
Copy link
Member Author

vstinner commented May 4, 2025

I wrote #133401 to fix test_frame random crash.

@vstinner vstinner changed the title test_frame does crash randomly on AMD64 RHEL8 LTO 3.x test_frame does crash randomly on Linux x86-64 May 4, 2025
@vstinner
Copy link
Member Author

vstinner commented May 4, 2025

I also saw the crash on "PPC64LE RHEL8 LTO + PGO 3.x" buildbot: https://buildbot.python.org/#/builders/458/builds/6121

test_repr_deep (test.test_frame.FrameLocalsProxyMappingTests.test_repr_deep) ... Fatal Python error: Segmentation fault

Current thread 0x00007fffb9b845e0 [python] (most recent call first):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-ppc64le.lto-pgo/build/Lib/unittest/case.py", line 247 in handle
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-ppc64le.lto-pgo/build/Lib/unittest/case.py", line 813 in assertRaises
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-ppc64le.lto-pgo/build/Lib/test/mapping_tests.py", line 634 in test_repr_deep

@vstinner vstinner changed the title test_frame does crash randomly on Linux x86-64 test_frame does crash randomly on Linux (x86-64 and ppc64le) May 4, 2025
@vstinner vstinner changed the title test_frame does crash randomly on Linux (x86-64 and ppc64le) test_frame does crash randomly on Linux (x86-64, ppc64le, s390x) May 5, 2025
@vstinner
Copy link
Member Author

vstinner commented May 5, 2025

I also saw the crash on "s390x RHEL8 LTO + PGO 3.x" buildbot: https://buildbot.python.org/#/builders/442/builds/9283

test_repr_deep (test.test_frame.FrameLocalsProxyMappingTests.test_repr_deep) ... Fatal Python error: Segmentation fault

Current thread 0x000003ffaf3f7270 [python] (most recent call first):
  File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x.lto-pgo/build/Lib/unittest/case.py", line 247 in handle
  File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x.lto-pgo/build/Lib/unittest/case.py", line 813 in assertRaises
  File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x.lto-pgo/build/Lib/test/mapping_tests.py", line 634 in test_repr_deep
  ...

@markshannon
Copy link
Member

@markshannon: I don't understand why _PyTrash_thread_destroy_chain() is called by _Py_Dealloc() whereas the call stack is very deep (150 000 Python frames / 300 000 C frames). Is it a bug in _Py_RecursionLimit_GetMargin()? Or is the limit badly configured?

That's a bit surprising.
If recursion is so deep that we are raising a recursion error, we definitely shouldn't be calling _PyTrash_thread_destroy_chain().

@markshannon
Copy link
Member

It appears that _PyTrash_thread_destroy_chain() is being called correctly, during later handling of the overflow.

@vstinner
Copy link
Member Author

vstinner commented May 5, 2025

Fixed by #133431. Thanks @markshannon.

@vstinner
Copy link
Member Author

vstinner commented May 5, 2025

I reopen the issue, sadly an assertion fails on x86 (32-bit): #133431 (comment)

@vstinner vstinner reopened this May 5, 2025
@github-project-automation github-project-automation bot moved this from Done to In Progress in Release and Deferred blockers 🚫 May 5, 2025
ambv pushed a commit that referenced this issue May 5, 2025
Make sure trashcan pointer look mortal -- 32 bit
@hugovk
Copy link
Member

hugovk commented May 6, 2025

Has the merge of #133450 fixed this?

@markshannon
Copy link
Member

I believe so, yes.

@hugovk hugovk closed this as completed May 6, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Release and Deferred blockers 🚫 May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker tests Tests in the Lib/test dir type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Development

No branches or pull requests

6 participants