-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Improve traceability of assertion errors in common functions #130454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The docs say:
In general, if NULL is returned, then usually an exception is set. But the docs could perhaps be improved.
Normally, functions shouldn't be called with a set exception, except those clearing exceptions or handling exceptions as they could reset another exception afterwards. I don't know if we documented this or not. |
OTOH, if a missing attribute shouldn't be an error, you should use https://docs.python.org/3/c-api/object.html#c.PyObject_GetOptionalAttr instead as it silences the AttributeError |
Yeah, but maybe python could be a bit nicer and not throw an uncatchable critical error for this? |
Especially since it's something that could, in theory, be filtered out immediately |
But actually, I'll probably switch our implementation to this if (PyObject_HasAttrString(module, "_internal_mod_quit"))
funcobj = PyObject_GetAttrString(module, "_internal_mod_quit");
else
{
funcobj = PyObject_GetAttrString(module, "quit");
} |
Well.. i'm assuming that the crash says that an exception is set. However, that's how the C API usually works (this is documented in https://docs.python.org/3/c-api/intro.html#exceptions) [emphasis mine]
Stated otherwise, if the first call to funcobj = PyObject_GetAttrString(module, "_internal_mod_quit");
/* If we could not load _internal_mod_quit, load quit function */
if (!funcobj) {
PyErr_Clear();
funcobj = PyObject_GetAttrString(module, "quit");
} |
This is also an alternative. |
No, no it does not. It's a
|
That's exactly what it says. The SIGABRT is because of an assertion failure. |
Yeah, it says an assertion failure. But how did we get there? |
No, I mean, there are many reasons why we would get here, but the problem here is the usage. While we could change the behaviour of all functions by checking that
Well... it's an API misuse, and we can't do more than writing docs to avoid that. In this case, it's mentioned that a C API function should not be called with an exception set, unless stated otherwise. Sorry, but I think we should close this one as I'll tag some fellow C API experts @vstinner @encukou concerning whether we should have
That's because there's no way to recover (properly in DEBUG mode) from such failure at this point. So we should kill the interpreter. In a release mode, that means swallowing up an exception, possibly leaking resources. In a DEBUG mode, you don't want to silently swallow an exception or leak resources without notice. |
What am I supposed to do in this case though? I get your points, but without some kind of traceability, it's almost impossible sometimes to actually figure out where the crash is coming from to fix it. Like this log for example. I'm fixing things on our end, and it runs fine on my local machine, and the action even runs fine locally for me, but when github runs it, now it's crashing. I can't run gdb through this, so I'm kinda SOL on fixing this... (Granted, that was a manufactured scenario. I set that up as a demo for what happens before my fixes, but the point still stands I think) |
I see where you're coming from. One solution, as I said, is to add cc @ZeroIntensity as well |
PyObject_GetAttr
throws SIGABRT
on debug builds of python on a pre-existing exception
Even with faulthandler C stacks, I'm not sure using faulthandler all the time is a good idea, especially on free-threading. It's currently prone to races in some programs, including on GILicious builds. Rather than static inline void
_Py_SanityCheck(void)
{
_Py_AssertHoldsTstate();
assert(!PyErr_Occurred());
} And then maybe something to sanity-check object arguments too: static inline void
_Py_SanityCheckObjects(PyObject **objects, Py_ssize_t num)
{
_Py_SanityCheck();
for (Py_ssize_t i = 0; i < num; ++i)
{
PyObject *op = objects[i];
assert(op != NULL);
assert(!_PyObject_IsFreed(op));
assert(Py_REFCNT(op) > 0);
}
}
#define _Py_NUMARGS(...) (sizeof((PyObject*[]) { __VA_ARGS__ }) / sizeof(PyObject*))
#define _Py_SanityCheckObjects(...) _Py_SanityCheckObjects((PyObject *[]){ __VA_ARGS__ }, _Py_NUMARGS(__VA_ARGS__)) It doesn't seem like it would be all that disruptive to add a one-liner check to stable functions, but I'm thinking out loud. |
Yes, I thought of something like this that can be hidden behind an EDIT: we likely need a macro otherwise, the location will also fail I think. Because the |
The main thing the debug build does is enable assertions; without a debugger those turn into fatal errors. Using the debug build without access to a debugger is indeed not very useful. For GitHub actions, there are workarounds like gha-ssh. (I haven't tried it myself yet, though.) Adding a check of the pre-requisites to each function is, I think, a neat idea, but would be a lot of work to doit in some maintainable way. IMO it goes together with cataloging what pre-requisites are. See #127443, which is more about refcounting, but “which arguments can be NULL” or “does this function need |
SIGABRT can be catched in the process or by a debugger such as gdb. You can enable faulthandler which registers a signal handler for SIGABRT and dumps the Python traceback where the signal was raised. A C debugger is really helpful to debug C extensions, right. You should try to reproduce the issue locally.
That's a lot of work and it's not doable in all cases. In practice, some C functions are called with an exception set to handle the current exception. |
My point was that some of this code has run perfectly fine for well over a decade as-is, unsupported as it may be. Yes, it's violating the guidance that you shouldn't use the C-API with an exception set, and I'll work on cleaning that part up, but if it's going to throw SIGABRT when debug is present, then I would at least expect some kind of exception when compiled for release. I don't think it's unreasonable to expect switching from release to debug to just throw SIGABRT out of left field |
The only reason I was using debug symbols in the first place was because I was working on free threading, and gdb wasn't being super helpful because it couldn't see all the internal symbols in the release Python build |
What do you expect the debug build to do? To only get debug symbols, you can use compiler options (on GCC/clang, add |
I still do think that a sanity-check macro would be worthwhile; I do know people that test locally on debug builds, but it would be especially convenient for debugging on our end. For this specific case: I sometimes have a workflow that runs my test suite through Valgrind to catch silent memory errors or UB, but that can double as a debugging workflow for things like this. Would you consider using something like that? |
Well, my expectation generally when I switch from release to debug mode is that behavior remains unchanged overall. Extra logged info, fewer optimizations, and sanity checks for impossible situations perhaps. Usually, I'm after the debug symbols. What I don't expect is sudden fatal exceptions that never happen in the release build. IMO assert should be reserved for checking impossible situations as a sanity check, not as a gotcha to catch users not using the API correctly |
Not sure this works with pyenv builds. We use meson-python as our build backend, and I've never had success getting it to pick up on my self-built versions of python, so it's either install a prebuilt or use something like pyenv |
I'll look into it. I'm kinda terrified of the attempt though because I'm afraid of what it'll find in all the legacy code that's as old as I am. |
Feature request
(By @picnizx)
While the C API docs explicitly state that functions should usually not be called with an exception set, else some assertion errors may occur and/or the thread state may be invalid, this means that
assert(!PyErr_Occurred())
in common macros / functions such as_PyType_LookupRef
would just crash. However, in a CI scenario, without a debugger, it's extremely hard to pinpoint the actual cause.Related: #128159 would give the possibility to show the C stack trace with
-X faulthandler
.A crude idea: add
assert(!PyErr_Occurred())
inside public common C API functions to be sure that the assertion is checked as well, making debugging a bit easier.Toy Scenario (original request):
I was working on freethread support for
pygame-ce
, and I finally need to use a python build with debug symbols, and ouratexit
-registeredquit
function was causingSIGABRT
to be thrown. This took me a while to figure out. Here's all the code you need to run (or just runpython -c "import pygame"
sincepygame.quit()
isatexit
-registered).Here's the guilty block of code in our codebase:
In the 3.12 docs for
PyObject_GetAttrString
andPyObject_GetAttr
. there's no mention at all that there shouldn't be any exceptions set before calling this function, and it works on release versions of python, so nobody has ever questioned this structure. I'll be making a pull request on our end to clear any exceptions between the first call and second call if needed (or usePyObject_GetOptionalAttrString
in 3.13+).The reason for this issue is that I'm not sure throwing a
SIGABRT
is the best option here.gdb backtrace on a debug python 3.13t
[gdb.txt](https://github.com/user-attachments/files/18922524/gdb.txt)
CPython versions tested on:
3.12, 3.13, 3.14
Operating systems tested on:
Linux
The text was updated successfully, but these errors were encountered: