-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-117174: Add a new route in linecache to fetch interactive source code #117500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f9c7ec5
to
90de6a6
Compare
This is a breaking change right? Maybe we should do the same thing as the interactive input - to secretly convert the file name back to |
Breaking how? This doesn't affect any visible thing that's guaranteed unless I am missing something.
Not sure I follow, there is no stable API that guarantees this name anywhere in a way that you can use it. This is just the name of the code object that's get created when you use Python with
Again, not sure how you are arriving to. this conclusion. There is absolutely no guarantee on how the interactive code objects are named |
I have added a translation layer similarly as we do for the REPL so it looks a bit better in tracebacks |
First of all, the tests are failing. I don't think "backward-compatibility" is only about the things we "guarantee", it's also about the behavior that has been there for a long time, even without documentation. This change breaks the CPython test suite and I bet it will break many other library tests. Making sure the I'm not saying that this is the wrong path, simply suggesting that this might break other people's code because it changes a behavior that has been there for a long time. Yes I'm aware that this basically applies to any code change we do for CPython, but maybe we could give it some extra thought and compensate as we can (and like to). The translation in |
I understand, but we cannot just lift every observable behaviour to something we must preserve. Just take into account when we balance our options. In general I agree with you, and you have an excellent point that if the test suite is breaking in more than one place, that's a reason for concern. But in any case our approach here is quite limited: we can basically rename the string and fix whatever we can (but there will be places like |
@lysnikolaou thoughts? |
I agree that we need to balance the new changes and the backward compatibility issue, otherwise we won't be able to move forward. First of all, I think the current solution, where we change the name of the code name string + a translator in traceback is acceptable. That's simple and straightfoward. I do have an alternative proposal which could solve this issue in a larger scope. We can somehow expand the capability for When we register a code object with So, instead of hacking the existing "filename" based And, we unlocked the potential to give source to all dynamically generated code, not only the ones in REPL and This will add some overhead when we register the code with |
This is a tricky one. In principle, I agree that not every behavior can be promoted to something we must preserve. This appears to be one of those cases though, where this is something we should not be preserving, but enough test suites will fail that we'll get complaints about it. And, since this is the case, it might make sense for us to put in the effort to preserve the filename. I don't understand enough about the |
In gist, we ask In this way, we are fully backward-compatible, all the existing user code should work because the behavior does not change at all. We get source code in the traceback for anything we'd like (REPL, If @pablogsal is convinced that this approach has a chance, I can do the draft implementation. |
The problem is that it's not just the traceback: it's the code object. Most test here fail because either pdb or tracemalloc are getting the wrong name because of the different name and those use different traceback machinery or directly check the code object. You approach will still break people that don't use the standard one or that inspect the code objects and expect a different thing, so it's not fully backwards compatible. |
Also, just for me to understand correctly, isn't this a breaking change? If someone is registering by hand something starting with |
I agree that most of the cases are like this but we don't know if this is always true to the point that it doesn't matter. This is python we are talking about and users generate a lot of edge cases |
With my proposal, we do not change the name of the code object - which is exactly why the method is more backward compatible compared to the current one. The filename
In my mind, a breaking change is something that breaks the existing code. There's no public way for the users to "register" anything now. If the user opt-in the new feature, then the behavior will be different - that's not breaking change, that's just new. If the user programs as before, everything will be the same.
Yes, but the time to search through the code objects will always be significantly shorter than compiling them. If they chose to compile a huge file and create a lot of the code objects dynamically, the time cost is already significant. The worst case complexity for searching code objects is not worse than compiling them. |
Ok, I think I misunderstood your proposal because I clearly had the wrong idea of how it would work. But then, what's the difference between this and having an "alternate" storage for line cache that it's only internal and only the interpreter can use and it's only retrieved from traceback machinery? Let's say the traceback machinery first checks in the normal line cache and if that fails it checks in the "secret storage" and with |
Yes this is basically having an alternate storage in If we have a reliable source for dynamically generated code, why keep it a secret? What's the advantage to combine it with existing interface? Well, first of all, they are doing the exact same thing from high level - to get source code. They can share all the existing code like cache mechanism. More importantly, the funcion call can simply be modifed from linecache.getline(filename, lineno) to linecache.getline(filename, lineno, code=code) When filename does not start with line = linecache.getline(filename, lineno)
if not line:
line = linecache.secret_getline(filename, lineno, code=code) The existing user code won't be interfered as they will behave as before. The user can opt-in the feature to get dynamically generated code - I believe that was asked for before. |
Because that's a new feature and we are considering a bugfix. (I'm on my phone now, I can answer in more detail later but this is basically why my intent is to keep this as simple as possible) |
Okay that's reasonable, but can we also make it extensible for the future? I work on So strictly for the bug fix:
I prefer the latter. With this change, I don't think this solution is much more complicated than changing the code file name and translate it back in |
How do you retrieve this? The public APIs to the line cache module work on filename, not on code object. Specially if you want Also, the traceback module in most places where we call |
…tion in linecache
…tion in linecache
bde58a6
to
d4090e9
Compare
Tests are still failing, is it related? |
HummMm something happened in the rebase with the new code added since the last update . Will look into it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve for the mechanism of the linecache part. I don't know if the id
and _idfunc
replacements are proper.
Can you elaborate? |
Ah you mean on the tests? Then yes, because those tests were flaky and they were relying on |
Ah, okay, that's some interesting testing mechanism. But yeah, I don't know about the repl part and if anyone other than me should give it another review, but the overall logic to make this happen looks good to me. It's been a long journey :) Thanks for the effort! |
Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13. |
…urce code (pythonGH-117500) (cherry picked from commit a931a8b) Co-authored-by: Pablo Galindo Salgado <[email protected]>
GH-131060 is a backport of this pull request to the 3.13 branch. |
How did this even happen though? It was fine before? |
I think the problem is that GH actions was happy when the PR was created but the merge broke it. merge queue would have watched this :S |
I think this introduces a refleak. The leak is in pyrepl, not linecache. Edit: the test didn't pass, I'd commented some of out them out when modifying Lib/_pyrepl/console.py |
Hummm, I will take a look today |
Fix for the leak here: #131095 |
Using leads to confusion when linecache it's populated with
entries as the inspect module heavily relies on it. An example:
./python -c "import inspect;x=eval('lambda x: x');print(inspect.getsource(x))"
The code above should trigger an exception because we should not be able
to get the source of x (it's dynamically generated) but if we use
as name for interactive code it will return gives the full string.
linecache.cache
sometimes has an entry for<string>
under Python 3.13.0a5 #117174