PEP 675: Arbitrary Literal Strings #2167

pradeep90 · 2021-11-30T21:04:11Z

Move in draft from the Google Doc.
Turn unicode to ASCII.

cc @gbleaney @JelleZijlstra

gvanrossum · 2021-11-30T21:15:22Z

@JelleZijlstra When you land this, make sure the commit title is more descriptive than "Initial commit." :-)

JelleZijlstra · 2021-11-30T21:24:30Z

I'll request some changes and then merge it as "subsequent commit".

Thanks for sending the PR! I'll review it later today and hopefully merge it.

pradeep90 · 2021-11-30T21:25:26Z

@JelleZijlstra When you land this, make sure the commit title is more descriptive than "Initial commit." :-)

Hehe, changed the commit title.

pep-0675.rst

JelleZijlstra · 2021-11-30T22:17:53Z

pep-0675.rst

+doesn't change inference for other ``str`` methods such as
+``literal_string.upper()``. If this PEP is accepted, we could also
+overload the typeshed stubs for ``str`` to preserve literal-ness in a
+broader set of scenarios where it makes sense. For example,


Could there be more of a specification of "where it makes sense"? How should typeshed decide when to use Literal[str]?

Good question. We faced a bit of a dilemma here. We wanted to limit the specification, which is why we enumerated the 4 most-frequently used means of composing literal strings. Listing all of the str methods seemed like overkill and we didn't want the PEP to be rejected because of too many changes to typeshed.

But users might ask for more convenient changes in the future - such as my_literal_str.upper(). So, we wanted to leave this open.

I've replaced "where it makes sense" with "where all inputs are literals" and clarified that this is merely for convenience and is not required by the PEP.

What do you suggest?

I don't have a concrete suggestion, but as a typeshed maintainer I'd have to come up with some rules, so I'd appreciate if the PEP offered more guidance on when exactly a function should use Literal[str] annotations.

Here's my attempt to make it more concrete. I've sorted all the str functions from typeshed into categories:

Overload Literal[str] output if Literal[str] inputs

def __new__(cls, object: Literal[str] = ...) -> Literal[str]: ... def capitalize(self: Literal[str]) -> Literal[str]: ... def casefold(self: Literal[str]) -> Literal[str]: ... def center(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ... if sys.version_info >= (3, 8): def expandtabs(self: Literal[str], tabsize: SupportsIndex = ...) -> Literal[str]: ... else: def expandtabs(self: Literal[str], tabsize: int = ...) -> Literal[str]: ... def format(self: Literal[str], *args: Literal[str], **kwargs: Literal[str]) -> Literal[str]: ... def format_map(self: Literal[str], map: Mapping[str, Literal[str]]) -> Literal[str]: ... def join(self: Literal[str], __iterable: Iterable[Literal[str]]) -> Literal[str]: ... def ljust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ... def lower(self: Literal[str]) -> Literal[str]: ... def lstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ... def partition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ... # Biasing to overly restrictive: '__sep' def replace(self: Literal[str], __old: Literal[str], __new: Literal[str], __count: SupportsIndex = ...) -> Literal[str]: ... # Biasing to overly restrictive: '__old' if sys.version_info >= (3, 9): def removeprefix(self: Literal[str], __prefix: Literal[str]) -> Literal[str]: ... # Biasing to overly restrictive: '__prefix' def removesuffix(self: Literal[str], __suffix: Literal[str]) -> Literal[str]: ... # Biasing to overly restrictive: '__suffix' def rjust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ... def rpartition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ... # Biasing to overly restrictive: '__sep' def rsplit(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ... # Biasing to overly restrictive: 'sep' def rstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ... # Biasing to overly restrictive: '__chars' def split(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ... # def splitlines(self: Literal[str], keepends: bool = ...) -> list[Literal[str]]: ... def strip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ... # def swapcase(self: Literal[str]) -> Literal[str]: ... def title(self: Literal[str]) -> Literal[str]: ... def upper(self: Literal[str]) -> Literal[str]: ... def zfill(self: Literal[str], __width: SupportsIndex) -> Literal[str]: ... def __add__(self: Literal[str], __s: Literal[str]) -> Literal[str]: ... def __iter__(self: Literal[str]) -> Iterator[str]: ... def __mod__(self: Literal[str], __x: Union[Literal[str], Tuple[Literal[str], ...]]) -> str: ... def __mul__(self: Literal[str], __n: SupportsIndex) -> Literal[str]: ... def __repr__(self: Literal[str]) -> Literal[str]: ... def __rmul__(self: Literal[str], n: SupportsIndex) -> Literal[str]: ... def __str__(self: Literal[str]) -> Literal[str]: ... def __getnewargs__(self: Literal[str]) -> tuple[Literal[str]]: ...

Explicitly ruing out override

# Can only be made safe if `Literal[int]` were supported for table def translate(self, __table: Mapping[int, int | str | None] | Sequence[int | str | None]) -> str: ... # Allows selecting arbitrary strs (constrained by the contents of self)if given arbitrary index / slice def __getitem__(self: Literal[str], __i: SupportsIndex | slice) -> str: ...

Override doesn't make sense (ie. doesn't return str). Note that some of these could make sense for Literal[int], Literal[bytes], etc. if we choose to support those.

def count(self, x: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ... def encode(self, encoding: str = ..., errors: str = ...) -> bytes: ... def endswith( self, __suffix: str | Tuple[str, ...], __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ... ) -> bool: ... def find(self, __sub: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ... def index(self, __sub: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ... def isalnum(self) -> bool: ... def isalpha(self) -> bool: ... if sys.version_info >= (3, 7): def isascii(self) -> bool: ... def isdecimal(self) -> bool: ... def isdigit(self) -> bool: ... def isidentifier(self) -> bool: ... def islower(self) -> bool: ... def isnumeric(self) -> bool: ... def isprintable(self) -> bool: ... def isspace(self) -> bool: ... def istitle(self) -> bool: ... def isupper(self) -> bool: ... def rfind(self, __sub: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ... def rindex(self, __sub: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ... def startswith( self, __prefix: str | Tuple[str, ...], __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ... ) -> bool: ... @staticmethod @overload def maketrans(__x: dict[int, _T] | dict[str, _T] | dict[str | int, _T]) -> dict[int, _T]: ... @staticmethod @overload def maketrans(__x: str, __y: str, __z: str | None = ...) -> dict[int, int | None]: ... def __contains__(self, __o: str) -> bool: ... # type: ignore[override] def __eq__(self, __x: object) -> bool: ... def __ge__(self, __x: str) -> bool: ... def __gt__(self, __x: str) -> bool: ... def __hash__(self) -> int: ... def __le__(self, __x: str) -> bool: ... def __len__(self) -> int: ... def __lt__(self, __x: str) -> bool: ... def __ne__(self, __x: object) -> bool: ...

@pradeep90 perhaps we incorporate the suggested overrides from the first section as an appendix?

I don't have a concrete suggestion, but as a typeshed maintainer I'd have to come up with some rules, so I'd appreciate if the PEP offered more guidance on when exactly a function should use Literal[str] annotations.

@gbleaney Thanks for classifying the str methods.

@JelleZijlstra For now, I've removed the paragraph about typeshed and other str methods. We need to discuss some tradeoffs for those scenarios. Basically, adding Literal[str] overload to a method in str would affect any class that subclasses str - users could see spurious override errors (snippet) and could violate type safety by returning a value of non-literal type. I'll make that a separate PR since we might indeed decide not to change str in typeshed.

Does that unblock this initial PR?

Sounds good, let's just land the PEP first. I have some concerns about how this would interact with typeshed but that's probably better discussed elsewhere.

pep-0675.rst

JelleZijlstra · 2021-11-30T22:36:20Z

pep-0675.rst

+
+::
+
+    def execute(self, sql: Literal[str], parameters: Iterable[str] = ...) -> Cursor: ...


Where would you make this change? Would typeshed's stubs for sqlite change?

Yes, we would have to update its typeshed stub. Linked to it.

+ Add the draft PEP. + Turn unicode quotes to ASCII.

NotWearingPants · 2022-05-20T11:52:51Z

perfectly distinguishing good and bad queries reduces to the halting problem

@pradeep90 I'd love to see a proof of this, can you provide the reference?

gbleaney · 2022-05-20T17:21:49Z

@NotWearingPants we don't have a formalized proof for you, only the intuitive comparison: Similarly to how pathological program can decide whether or not to halt based on what the analyzing program will predict, a pathological program could decide to perform a malicious operation or not based on whether or not the analyzing program predicted it would. Perhaps we should have just invoked Rice's Theorem, or made a less strong statement. The key is just that it's impossible to know for some programs if they are going to do something malicious or not

pradeep90 requested a review from a team as a code owner November 30, 2021 21:04

the-knights-who-say-ni added the CLA signed label Nov 30, 2021

gvanrossum changed the title ~~PEP 675: Initial commit.~~ PEP 675: Arbitrary Literal Strings Nov 30, 2021

pradeep90 force-pushed the literal-strings branch from 6353961 to 8ca76b4 Compare November 30, 2021 21:24

JelleZijlstra self-assigned this Nov 30, 2021

JelleZijlstra requested changes Nov 30, 2021

View reviewed changes

pradeep90 force-pushed the literal-strings branch from 8ca76b4 to 4d32411 Compare November 30, 2021 23:23

pradeep90 requested a review from JelleZijlstra November 30, 2021 23:24

pradeep90 force-pushed the literal-strings branch from 4d32411 to 13025d5 Compare December 1, 2021 00:50

PEP 675: Arbitrary literal strings.

643ee18

+ Add the draft PEP. + Turn unicode quotes to ASCII.

pradeep90 force-pushed the literal-strings branch from 13025d5 to 643ee18 Compare December 1, 2021 06:53

JelleZijlstra approved these changes Dec 1, 2021

View reviewed changes

JelleZijlstra merged commit 21f6993 into python:master Dec 1, 2021

erlend-aasland mentioned this pull request May 13, 2022

pep8/greppable exception messages erlend-aasland/peps#1

Closed

erlend-aasland mentioned this pull request Jun 27, 2022

pep 687/mark as accepted erlend-aasland/peps#2

Closed


		::

		def execute(self, sql: Literal[str], parameters: Iterable[str] = ...) -> Cursor: ...

Uh oh!

PEP 675: Arbitrary Literal Strings #2167

PEP 675: Arbitrary Literal Strings #2167

Uh oh!

Conversation

pradeep90 commented Nov 30, 2021

Uh oh!

gvanrossum commented Nov 30, 2021

Uh oh!

JelleZijlstra commented Nov 30, 2021

Uh oh!

pradeep90 commented Nov 30, 2021

Uh oh!

Uh oh!

JelleZijlstra Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

pradeep90 Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

JelleZijlstra Dec 1, 2021

Choose a reason for hiding this comment

Uh oh!

gbleaney Dec 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Overload Literal[str] output if Literal[str] inputs

Explicitly ruing out override

Override doesn't make sense (ie. doesn't return str). Note that some of these could make sense for Literal[int], Literal[bytes], etc. if we choose to support those.

Uh oh!

pradeep90 Dec 1, 2021

Choose a reason for hiding this comment

Uh oh!

JelleZijlstra Dec 1, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JelleZijlstra Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

pradeep90 Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

NotWearingPants commented May 20, 2022

Uh oh!

gbleaney commented May 20, 2022

Uh oh!

Uh oh!

gbleaney Dec 1, 2021 •

edited

Loading

Override doesn't make sense (ie. doesn't return str). Note that some of these could make sense for `Literal[int]`, `Literal[bytes]`, etc. if we choose to support those.