gh-117174: Add a new route in linecache to fetch interactive source code #117500

pablogsal · 2024-04-03T10:51:32Z

Using leads to confusion when linecache it's populated with
entries as the inspect module heavily relies on it. An example:

./python -c "import inspect;x=eval('lambda x: x');print(inspect.getsource(x))"

The code above should trigger an exception because we should not be able
to get the source of x (it's dynamically generated) but if we use
as name for interactive code it will return gives the full string.

Issue: linecache.cache sometimes has an entry for <string> under Python 3.13.0a5 #117174

gaogaotiantian · 2024-04-03T17:05:45Z

This is a breaking change right? Maybe we should do the same thing as the interactive input - to secretly convert the file name back to <string> in the traceback. We should also be more careful about the name because that would be used in the future and we need to keep it for a long time. Unlike the REPL commands, this is something the users are more likely to see. This also introduces some backward-incompatibility in pdb. Basically the file name of the code object changed and that's public.

pablogsal · 2024-04-03T17:16:16Z

This is a breaking change right?

Breaking how? This doesn't affect any visible thing that's guaranteed unless I am missing something.

We should also be more careful about the name because that would be used in the future and we need to keep it for a long time.

Not sure I follow, there is no stable API that guarantees this name anywhere in a way that you can use it. This is just the name of the code object that's get created when you use Python with -c. That's absolutely not a public API and it's not stable.

Basically the file name of the code object changed and that's public.

Again, not sure how you are arriving to. this conclusion. There is absolutely no guarantee on how the interactive code objects are named

pablogsal · 2024-04-03T17:21:00Z

I have added a translation layer similarly as we do for the REPL so it looks a bit better in tracebacks

gaogaotiantian · 2024-04-03T18:11:42Z

Breaking how? This doesn't affect any visible thing that's guaranteed unless I am missing something.

First of all, the tests are failing.

I don't think "backward-compatibility" is only about the things we "guarantee", it's also about the behavior that has been there for a long time, even without documentation.

This change breaks the CPython test suite and I bet it will break many other library tests.

Making sure the traceback behaves as before is a very helpful step, but there might be code in the market that relies on the name of the code object. pdb will show a different stack I believe.

I'm not saying that this is the wrong path, simply suggesting that this might break other people's code because it changes a behavior that has been there for a long time. Yes I'm aware that this basically applies to any code change we do for CPython, but maybe we could give it some extra thought and compensate as we can (and like to). The translation in traceback for example, is very useful in my opinion.

pablogsal · 2024-04-03T18:31:44Z

I don't think "backward-compatibility" is only about the things we "guarantee", it's also about the behavior that has been there for a long time, even without documentation.

I understand, but we cannot just lift every observable behaviour to something we must preserve. Just take into account when we balance our options.

In general I agree with you, and you have an excellent point that if the test suite is breaking in more than one place, that's a reason for concern.

But in any case our approach here is quite limited: we can basically rename the string and fix whatever we can (but there will be places like pdb that we cannot "fix") but we won't be able to change the code object, so thing may break in other ways. Another way it's to close the issue as won't fix or to deactivate the feature for -c (which is in any case not trivial as we don't distinguish down the line between -c and REPL).

pablogsal · 2024-04-03T18:33:51Z

@lysnikolaou thoughts?

gaogaotiantian · 2024-04-03T19:56:42Z

I agree that we need to balance the new changes and the backward compatibility issue, otherwise we won't be able to move forward.

First of all, I think the current solution, where we change the name of the code name string + a translator in traceback is acceptable. That's simple and straightfoward.

I do have an alternative proposal which could solve this issue in a larger scope.

We can somehow expand the capability for linecache, so that when the file name starts with <, it requires a code object to confirm the corresponding text lines. This will not break any existing code using linecache, and only adds some extra logic for filename starting with < - it will basically do another layer of mapping for each code object, which is hashable and can be the key of the dict.

When we register a code object with linecache._register_code, it iterates through all the children code objects (using co_consts) and register them each. When we need to extract the source code of a dynamically generated code object, we need to explicity pass the code object itself - which is easily accessible in traceback.

So, instead of hacking the existing "filename" based linecache, we create a competely new path for dynamically generated code, which will introduce 0 backward compatibility issue.

And, we unlocked the potential to give source to all dynamically generated code, not only the ones in REPL and -c.

This will add some overhead when we register the code with linecache, but the majority of dynamically generated code is small and I don't think it'll make a big difference to the users.

lysnikolaou · 2024-04-04T15:49:03Z

This is a tricky one. In principle, I agree that not every behavior can be promoted to something we must preserve. This appears to be one of those cases though, where this is something we should not be preserving, but enough test suites will fail that we'll get complaints about it. And, since this is the case, it might make sense for us to put in the effort to preserve the filename.

I don't understand enough about the linecache module to be able to immediately see if @gaogaotiantian's approach will work, and what the trade-offs will be, but I'd be happy to review a PR where that approach is implemented and I can dive in a bit deeper to figure it out.

gaogaotiantian · 2024-04-04T16:02:33Z

In gist, we ask linecache for the source code when we are printing traceback, and the argument is the filename and the line number. Because we have fully control over the traceback and linecache, we can make traceback to pass filename, line number and the code object if the filename starts with <. In linecache, when the filename starts with <, a code object is required otherwise it will return '', which is the existing behavior for all files starting with <.

In this way, we are fully backward-compatible, all the existing user code should work because the behavior does not change at all. We get source code in the traceback for anything we'd like (REPL, -c and even more in the future), because we can explicitly pass the code object. In the future, we can make the registration public and the user can register the source code by themselves.

If @pablogsal is convinced that this approach has a chance, I can do the draft implementation.

pablogsal · 2024-04-04T16:55:50Z

The problem is that it's not just the traceback: it's the code object. Most test here fail because either pdb or tracemalloc are getting the wrong name because of the different name and those use different traceback machinery or directly check the code object. You approach will still break people that don't use the standard one or that inspect the code objects and expect a different thing, so it's not fully backwards compatible.

pablogsal · 2024-04-04T16:56:51Z

In linecache, when the filename starts with <, a code object is required otherwise it will return ''

Also, just for me to understand correctly, isn't this a breaking change? If someone is registering by hand something starting with < now suddenly we will do something different, no? What am I missing?

pablogsal · 2024-04-04T16:57:47Z

This will add some overhead when we register the code with linecache, but the majority of dynamically generated code is small and I don't think it'll make a big difference to the users.

I agree that most of the cases are like this but we don't know if this is always true to the point that it doesn't matter. This is python we are talking about and users generate a lot of edge cases

gaogaotiantian · 2024-04-04T17:15:17Z

The problem is that it's not just the traceback: it's the code object. Most test here fail because either pdb or tracemalloc are getting the wrong name because of the different name and those use different traceback machinery or directly check the code object. You approach will still break people that don't use the standard one or that inspect the code objects and expect a different thing, so it's not fully backwards compatible.

With my proposal, we do not change the name of the code object - which is exactly why the method is more backward compatible compared to the current one. The filename <string> is preserved for -c (it could even be preserved for REPL inputs if that matters). What I'm proposing is to use the actual code object as a key to search for source files, and that's unique.

Also, just for me to understand correctly, isn't this a breaking change? If someone is registering by hand something starting with < now suddenly we will do something different, no? What am I missing?

In my mind, a breaking change is something that breaks the existing code. There's no public way for the users to "register" anything now. If the user opt-in the new feature, then the behavior will be different - that's not breaking change, that's just new. If the user programs as before, everything will be the same.

I agree that most of the cases are like this but we don't know if this is always true to the point that it doesn't matter. This is python we are talking about and users generate a lot of edge cases

Yes, but the time to search through the code objects will always be significantly shorter than compiling them. If they chose to compile a huge file and create a lot of the code objects dynamically, the time cost is already significant. The worst case complexity for searching code objects is not worse than compiling them.

pablogsal · 2024-04-05T09:52:43Z

Ok, I think I misunderstood your proposal because I clearly had the wrong idea of how it would work.

But then, what's the difference between this and having an "alternate" storage for line cache that it's only internal and only the interpreter can use and it's only retrieved from traceback machinery? Let's say the traceback machinery first checks in the normal line cache and if that fails it checks in the "secret storage" and with -c and interactive mode we put it there. In this way we don't change code objects, this doesn't interfere with regular users and we get what we need.

gaogaotiantian · 2024-04-05T17:38:39Z

Yes this is basically having an alternate storage in linecache. We don't need to change the code objects in any case, they are only used as keys.

If we have a reliable source for dynamically generated code, why keep it a secret? pdb definitely wants it, other debuggers might as well. This is a useful feature for the users. I even think we should make the source code registeration public, but I'm okay for only making it internally available for now.

What's the advantage to combine it with existing interface? Well, first of all, they are doing the exact same thing from high level - to get source code. They can share all the existing code like cache mechanism. More importantly, the funcion call can simply be modifed from

linecache.getline(filename, lineno)

to

linecache.getline(filename, lineno, code=code)

When filename does not start with <, we ignore the code argument. The change on the calling side is minimal, they don't need to do something like

line = linecache.getline(filename, lineno)
if not line:
    line = linecache.secret_getline(filename, lineno, code=code)

The existing user code won't be interfered as they will behave as before. The user can opt-in the feature to get dynamically generated code - I believe that was asked for before.

pablogsal · 2024-04-05T18:26:40Z

If we have a reliable source for dynamically generated code, why keep it a secret?

Because that's a new feature and we are considering a bugfix. (I'm on my phone now, I can answer in more detail later but this is basically why my intent is to keep this as simple as possible)

gaogaotiantian · 2024-04-05T18:51:37Z

Because that's a new feature and we are considering a bugfix. (I'm on my phone now, I can answer in more detail later but this is basically why my intent is to keep this as simple as possible)

Okay that's reasonable, but can we also make it extensible for the future? I work on pdb a lot and it would be nice to have the source code for dymanically generated objects.

So strictly for the bug fix:

We keep the file name <string> for -c to avoid hassles for test cases (and that's our target for the future as well)
Keep the current code register interface so we do not need to touch C code at all. Also for now make it private and internal only.
When we register code, put it in a different container than linecache.cache with the id of the code object as the key(different source code can build to the code objects that equal to each other).
Do one of the following:
- In traceback, try to get from that container if source is not available (or code starts with <?). this requires a new private interface for linecache
- Change linecache.getline so that linecache handles this internally - it checks the file name, then look for it in the alternate container if applicable

I prefer the latter.

With this change, pdb can also use it. We may or may not promote this to public feature in the future, but with this pdb is already happy.

I don't think this solution is much more complicated than changing the code file name and translate it back in traceback, and I think this solution is helpful in long term.

pablogsal · 2024-04-10T17:40:38Z

3. When we register code, put it in a different container than linecache.cache with the id of the code object as the key(different source code can build to the code objects that equal to each other).

How do you retrieve this? The public APIs to the line cache module work on filename, not on code object. Specially if you want linecache.getline to do this internally, we cannot change the interface.

Also, the traceback module in most places where we call line cache we don't have the code object at hand and I don't think we want to push down the code object itself or some secret ID everywhere.

…tion in linecache

gaogaotiantian · 2025-02-13T22:51:38Z

Tests are still failing, is it related?

pablogsal · 2025-02-16T16:11:32Z

HummMm something happened in the rebase with the new code added since the last update . Will look into it

gaogaotiantian

I approve for the mechanism of the linecache part. I don't know if the id and _idfunc replacements are proper.

pablogsal · 2025-02-20T16:51:28Z

I don't know if the id and _idfunc replacements are proper.

Can you elaborate?

pablogsal · 2025-02-20T16:53:09Z

Ah you mean on the tests? Then yes, because those tests were flaky and they were relying on id to be called at a specific point but our id() is called before

gaogaotiantian · 2025-02-20T16:55:59Z

Ah, okay, that's some interesting testing mechanism. But yeah, I don't know about the repl part and if anyone other than me should give it another review, but the overall logic to make this happen looks good to me. It's been a long journey :) Thanks for the effort!

miss-islington-app · 2025-03-10T21:54:21Z

Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

…urce code (pythonGH-117500) (cherry picked from commit a931a8b) Co-authored-by: Pablo Galindo Salgado <[email protected]>

bedevere-app · 2025-03-10T21:54:32Z

GH-131060 is a backport of this pull request to the 3.13 branch.

StanFromIreland · 2025-03-10T22:43:42Z

How did this even happen though? It was fine before?

pablogsal · 2025-03-10T22:47:19Z

I think the problem is that GH actions was happy when the PR was created but the merge broke it. merge queue would have watched this :S

…ource code (GH-117500) (#131060) gh-117174: Add a new route in linecache to fetch interactive source code (GH-117500) (cherry picked from commit a931a8b) Co-authored-by: Pablo Galindo Salgado <[email protected]>

markshannon · 2025-03-11T11:36:28Z

@pablogsal

I think this introduces a refleak. The leak is in pyrepl, not linecache.
If I revert the changes to Lib/_pyrepl/console.py the leak goes away ~~and, oddly, the tests still pass~~

Edit: the test didn't pass, I'd commented some of out them out when modifying Lib/_pyrepl/console.py

pablogsal · 2025-03-11T12:51:20Z

Hummm, I will take a look today

pablogsal · 2025-03-11T14:50:09Z

Fix for the leak here: #131095

…urce code (python#117500)

pablogsal requested a review from ericsnowcurrently as a code owner April 3, 2024 10:51

bedevere-app bot added the awaiting core review label Apr 3, 2024

pablogsal changed the title ~~bpo-117174: Rename <string> to <__interactive_string__> for interactive code objects~~ gh-117174: Rename <string> to <__interactive_string__> for interactive code objects Apr 3, 2024

bedevere-app bot mentioned this pull request Apr 3, 2024

linecache.cache sometimes has an entry for <string> under Python 3.13.0a5 #117174

Closed

pablogsal force-pushed the bpo-117174 branch 2 times, most recently from f9c7ec5 to 90de6a6 Compare April 3, 2024 10:53

pablogsal force-pushed the bpo-117174 branch from 90de6a6 to 80b9509 Compare April 3, 2024 17:19

pablogsal requested a review from iritkatriel as a code owner April 3, 2024 17:19

pablogsal added a commit to pablogsal/cpython that referenced this pull request Apr 10, 2024

pythongh-117500: Store interactive source code in an alternative loca…

63e1632

…tion in linecache

pablogsal force-pushed the bpo-117174 branch from 80b9509 to 63e1632 Compare April 10, 2024 17:52

pablogsal changed the title ~~gh-117174: Rename <string> to <__interactive_string__> for interactive code objects~~ gh-117174: Store interactive source code in an alternative location in linecache Apr 10, 2024

pablogsal added a commit to pablogsal/cpython that referenced this pull request Apr 10, 2024

pythongh-117500: Store interactive source code in an alternative loca…

4fc4202

…tion in linecache

fixup! Fix repl

d4090e9

pablogsal force-pushed the bpo-117174 branch from bde58a6 to d4090e9 Compare February 13, 2025 01:57

fixup! fixup! Fix repl

89191d7

pablogsal requested review from lysnikolaou and ambv as code owners February 16, 2025 23:37

fixup! fixup! fixup! Fix repl

426215f

gaogaotiantian approved these changes Feb 20, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Feb 20, 2025

pablogsal merged commit a931a8b into python:main Mar 10, 2025
53 checks passed

pablogsal deleted the bpo-117174 branch March 10, 2025 21:54

bedevere-app bot removed the awaiting merge label Mar 10, 2025

pablogsal added the needs backport to 3.13 bugs and security fixes label Mar 10, 2025

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Mar 10, 2025

markshannon mentioned this pull request Mar 11, 2025

GH-127705: Check for immortality in refcount accounting #131072

Merged

seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025

pythongh-117174: Add a new route in linecache to fetch interactive so…

46b2e69

…urce code (python#117500)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-117174: Add a new route in linecache to fetch interactive source code #117500

gh-117174: Add a new route in linecache to fetch interactive source code #117500

pablogsal commented Apr 3, 2024 •

edited by bedevere-app bot

Loading

gaogaotiantian commented Apr 3, 2024

pablogsal commented Apr 3, 2024 •

edited

Loading

pablogsal commented Apr 3, 2024

gaogaotiantian commented Apr 3, 2024

pablogsal commented Apr 3, 2024 •

edited

Loading

pablogsal commented Apr 3, 2024

gaogaotiantian commented Apr 3, 2024 •

edited

Loading

lysnikolaou commented Apr 4, 2024

gaogaotiantian commented Apr 4, 2024 •

edited

Loading

pablogsal commented Apr 4, 2024

pablogsal commented Apr 4, 2024

pablogsal commented Apr 4, 2024

gaogaotiantian commented Apr 4, 2024

pablogsal commented Apr 5, 2024

gaogaotiantian commented Apr 5, 2024

pablogsal commented Apr 5, 2024 •

edited

Loading

gaogaotiantian commented Apr 5, 2024

pablogsal commented Apr 10, 2024 •

edited

Loading

gaogaotiantian commented Feb 13, 2025

pablogsal commented Feb 16, 2025

gaogaotiantian left a comment

pablogsal commented Feb 20, 2025

pablogsal commented Feb 20, 2025

gaogaotiantian commented Feb 20, 2025

miss-islington-app bot commented Mar 10, 2025

bedevere-app bot commented Mar 10, 2025

StanFromIreland commented Mar 10, 2025 •

edited

Loading

pablogsal commented Mar 10, 2025

markshannon commented Mar 11, 2025 •

edited

Loading

pablogsal commented Mar 11, 2025

pablogsal commented Mar 11, 2025

gh-117174: Add a new route in linecache to fetch interactive source code #117500

gh-117174: Add a new route in linecache to fetch interactive source code #117500

Conversation

pablogsal commented Apr 3, 2024 • edited by bedevere-app bot Loading

gaogaotiantian commented Apr 3, 2024

pablogsal commented Apr 3, 2024 • edited Loading

pablogsal commented Apr 3, 2024

gaogaotiantian commented Apr 3, 2024

pablogsal commented Apr 3, 2024 • edited Loading

pablogsal commented Apr 3, 2024

gaogaotiantian commented Apr 3, 2024 • edited Loading

lysnikolaou commented Apr 4, 2024

gaogaotiantian commented Apr 4, 2024 • edited Loading

pablogsal commented Apr 4, 2024

pablogsal commented Apr 4, 2024

pablogsal commented Apr 4, 2024

gaogaotiantian commented Apr 4, 2024

pablogsal commented Apr 5, 2024

gaogaotiantian commented Apr 5, 2024

pablogsal commented Apr 5, 2024 • edited Loading

gaogaotiantian commented Apr 5, 2024

pablogsal commented Apr 10, 2024 • edited Loading

gaogaotiantian commented Feb 13, 2025

pablogsal commented Feb 16, 2025

gaogaotiantian left a comment

Choose a reason for hiding this comment

pablogsal commented Feb 20, 2025

pablogsal commented Feb 20, 2025

gaogaotiantian commented Feb 20, 2025

miss-islington-app bot commented Mar 10, 2025

bedevere-app bot commented Mar 10, 2025

StanFromIreland commented Mar 10, 2025 • edited Loading

pablogsal commented Mar 10, 2025

markshannon commented Mar 11, 2025 • edited Loading

pablogsal commented Mar 11, 2025

pablogsal commented Mar 11, 2025

pablogsal commented Apr 3, 2024 •

edited by bedevere-app bot

Loading

pablogsal commented Apr 3, 2024 •

edited

Loading

pablogsal commented Apr 3, 2024 •

edited

Loading

gaogaotiantian commented Apr 3, 2024 •

edited

Loading

gaogaotiantian commented Apr 4, 2024 •

edited

Loading

pablogsal commented Apr 5, 2024 •

edited

Loading

pablogsal commented Apr 10, 2024 •

edited

Loading

StanFromIreland commented Mar 10, 2025 •

edited

Loading

markshannon commented Mar 11, 2025 •

edited

Loading