-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Stack overflow collecting PGO data on Windows #113655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Trying to repro, I can't even manage to complete a PGO build (first step above). |
It sounds like you are repro'ing (I probably wasn't clear enough). It fails during the PGO build while running the tests that generate PGO data.
|
It looks like 45e09f9 introduced the failing test, so it could just be that the test is incorrect (rather than a test going from passing to failing). |
Ah, you're right, the output similar to yours scrolled off my screen. :-( You should be able to experiment with some of the numbers changed by the PR to see which one affects the Windows PGO build. The stack limit is definitely an ongoing discussion. |
Windows release buildbots are similarly failing: https://buildbot.python.org/all/#/builders/914/builds/3274/steps/4/logs/stdio |
So it is crashing in this addition to
My conclusion is that the increased default C recursion limit is too large for the platform and we hit a stack overflow. Maybe try setting Py_C_RECURSION_LIMIT to something smaller than 8000? (I am trying now with 4000.) |
Yes -- my experimentation arrived at that too. 5,000 is too large, but 4,000 seems to work. See #113668. |
Also, this docstring (deleted by that PR) suggests a possible cause:
Maybe we are running into the inlining problem even in non-debug mode? |
Let's put this on tomorrow's agenda. |
Since Windows 8, we've had If we have any references to stack variables that are passed between frames (e.g. a pointer to a local in EvalFrame that's available in the next EvalFrame) then we can get the size of each recursion from the current build. Of course, that'll vary based on how it recurses (and how many of our own native APIs it goes through), but it's likely better than guessing. At the very least, it might be helpful for some assertions to detect when the guesses are invalidated. A true solution would be to use structured exception handling and catch the stack overflow in EvalFrame. That's going to leave things in a pretty messy state though (leaked references at a minimum, and likely worse), so I think it's better to just crash. |
We could also just check this against the address of a local in a function and raise if we're within some distance from the extent of the stack. It won't be a predictable number of recursions, and it may still be possible to use native recursion to cause a crash, but it should be fairly simple to implement and isn't really any worse than the alternatives. |
@markshannon It seems that even with #113944 merged, this bug still persists -- PGO build log. It's this line:
|
…xtraStackReserve=true (pythonGH-114263)
Lower the C recursion limit for PPC64 and SPARC, as they use relatively large stack frames that cause e.g. `test_descr` to hit a stack overflow. According to quick testing, it seems that values around 8000 are max for PPC64 (ELFv1 ABI) and 7000 for SPARC64. To keep things safe, let's use 5000 for PPC64 and 4000 for SPARC. Co-authored-by: Sam James <[email protected]>
Lower the C recursion limit for HPPA, PPC64 and SPARC, as they use relatively large stack frames that cause e.g. `test_descr` to hit a stack overflow. According to quick testing, it seems that values around 8000 are max for HPPA and PPC64 (ELFv1 ABI) and 7000 for SPARC64. To keep things safe, let's use 5000 for PPC64 and 4000 for SPARC. Co-authored-by: Sam James <[email protected]>
…4264) Lower the C recursion limit for HPPA, PPC64 and SPARC, as they use relatively large stack frames that cause e.g. `test_descr` to hit a stack overflow. According to quick testing, it seems that values around 8000 are max for HPPA and PPC64 (ELFv1 ABI) and 7000 for SPARC64. To keep things safe, let's use 5000 for PPC64 and 4000 for SPARC. Co-authored-by: Michał Górny <[email protected]>
Should I look into this? |
I think this is resolved and should have been closed. Are you still able to reproduce it, @encukou? |
Where's the fix? There's a workaround in the current build which significantly overallocates memory, otherwise we wouldn't be able to release. Build with |
The fix is to reduce Py_C_RECURSION_LIMIT and increase the stack size (
Is it a real issue to allocate 8 MB stack? |
Depends how many threads you create. IIRC, all the physical memory is reserved to ensure it will be contiguous, so you will run out of RAM 4x quicker than a 2MB stack. |
I'm still not sure I see how this is a blocker bug. What's being blocked? |
The final release build requires more stack memory than usual because PGO was enabled, yes (or no, if you were thinking something different). It's a C stack optimization issue. For whatever reason, due to how the stack ends up looking after PGO is applied (presumably to the main interpreter loop), we need to allocate more memory for every thread to account for the expected number of C recursions. Rather than revert the entire complex change that led to that regression, I chose to temporarily expand the memory used and left it marked as a deferred blocker. Without that workaround, we would've reverted a significant change (improvement, on the whole) to recursion tracking so we could release. It would be great to handle the recursion count better so that we don't have to experimentally reduce the limit each time something changes in the interpreter loop structure. But there's a whole paid team working on that, so I was hoping that by leaving it assigned to them, they'd do it. I'm not happy with saying "let's charge every user 1MB/thread because our team couldn't fix this". |
The stack size can be reduced on Windows release PGO build if Py_C_RECURSION_LIMIT is also reduced. |
The issue with reducing |
Please don't make more build-time changes to pyconfig. That's already proven to be a mistake, and as soon as we figure out how free-threaded extension builds are going to figure things out I'm ripping it out (they currently work about 50% of the time, I've proposed they just make it an explicit option for now and detect the current running interpreter if unspecified). If anything, we should just reduce the public macro and then optionally use a higher limit ourselves if we can. But I'd rather have some kind of approach that doesn't rely on a compile-time constant to figure out when the stack is becoming exhausted (catching the exception is not safe in our context because we don't control every point of the stack). |
Agreed. I think the best way to fix this is to push forwards implementing #91079 for Windows. I'd like to implement #91079 for all tier1 (maybe tier 2) platforms, but it is quite tricky for linux.
@zooba can you elaborate. |
@markshannon, do you expect to implement #91079 for Windows in 3.14? |
Unless there is a technical obstacle, yes. We might be able to do it for linux and mac for 3.14 as well, but no promises. |
OK. Thanks! I'll mark that as a blocker then. Please let me know if your plans change so I can try reducing |
To be clear, I mean the native SEH exception. Though I imagine that's what you would've assumed I meant? The stack overflow exception could come from any code, not just ceval.c, and that code likely won't be expecting it and won't clean up (release locks, etc.). In that circumstance, letting the OS terminate the process is the only safe thing (this will get worse when freethreading is using locks everywhere, though right now provided nobody catches the Python exception it's likely that the stack will just unwind straight up the current thread). e.g. the user might call into OpenSSL after |
Maybe /* Issue #33720: Disable inlining for reducing the C stack consumption
on PGO builds. */
Py_NO_INLINE static double
r_float_str(RFILE *p) |
Is there a plan to backport the fix to 3.12 and 3.13? Or do we need to keep the resource overallocation in those versions? I'd prefer to backport. Or at least trim the hardcoded limit to avoid crashing. |
The fix uncovered unrelated issues. I don't think backporting it would be appropriate; definitely not for all platforms. It needs the full alpha/beta period. |
Bug report
Bug description:
During a PGO build on Windows on main (471aa75), a test in
test_functools
fails with a stack overflow.Full log of build
CPython versions tested on:
CPython main branch
Operating systems tested on:
Windows
Linked PRs
The text was updated successfully, but these errors were encountered: