Skip to content

PEP 675: Arbitrary Literal Strings #2167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 1, 2021

Conversation

pradeep90
Copy link
Contributor

  • Move in draft from the Google Doc.
  • Turn unicode to ASCII.

cc @gbleaney @JelleZijlstra

@pradeep90 pradeep90 requested a review from a team as a code owner November 30, 2021 21:04
@gvanrossum gvanrossum changed the title PEP 675: Initial commit. PEP 675: Arbitrary Literal Strings Nov 30, 2021
@gvanrossum
Copy link
Member

@JelleZijlstra When you land this, make sure the commit title is more descriptive than "Initial commit." :-)

@JelleZijlstra
Copy link
Member

I'll request some changes and then merge it as "subsequent commit".

Thanks for sending the PR! I'll review it later today and hopefully merge it.

@pradeep90
Copy link
Contributor Author

@JelleZijlstra When you land this, make sure the commit title is more descriptive than "Initial commit." :-)

Hehe, changed the commit title.

@JelleZijlstra JelleZijlstra self-assigned this Nov 30, 2021
pep-0675.rst Outdated
doesn't change inference for other ``str`` methods such as
``literal_string.upper()``. If this PEP is accepted, we could also
overload the typeshed stubs for ``str`` to preserve literal-ness in a
broader set of scenarios where it makes sense. For example,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be more of a specification of "where it makes sense"? How should typeshed decide when to use Literal[str]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We faced a bit of a dilemma here. We wanted to limit the specification, which is why we enumerated the 4 most-frequently used means of composing literal strings. Listing all of the str methods seemed like overkill and we didn't want the PEP to be rejected because of too many changes to typeshed.

But users might ask for more convenient changes in the future - such as my_literal_str.upper(). So, we wanted to leave this open.

I've replaced "where it makes sense" with "where all inputs are literals" and clarified that this is merely for convenience and is not required by the PEP.

What do you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a concrete suggestion, but as a typeshed maintainer I'd have to come up with some rules, so I'd appreciate if the PEP offered more guidance on when exactly a function should use Literal[str] annotations.

Copy link
Contributor

@gbleaney gbleaney Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my attempt to make it more concrete. I've sorted all the str functions from typeshed into categories:

Overload Literal[str] output if Literal[str] inputs

def __new__(cls, object: Literal[str] = ...) -> Literal[str]: ...
def capitalize(self: Literal[str]) -> Literal[str]: ...
def casefold(self: Literal[str]) -> Literal[str]: ...
def center(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ...
if sys.version_info >= (3, 8):
    def expandtabs(self: Literal[str], tabsize: SupportsIndex = ...) -> Literal[str]: ...
else:
    def expandtabs(self: Literal[str], tabsize: int = ...) -> Literal[str]: ...
def format(self: Literal[str], *args: Literal[str], **kwargs: Literal[str]) -> Literal[str]: ...
def format_map(self: Literal[str], map: Mapping[str, Literal[str]]) -> Literal[str]: ...
def join(self: Literal[str], __iterable: Iterable[Literal[str]]) -> Literal[str]: ...
def ljust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ...
def lower(self: Literal[str]) -> Literal[str]: ...
def lstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ...
def partition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ...  # Biasing to overly restrictive: '__sep'
def replace(self: Literal[str], __old: Literal[str], __new: Literal[str], __count: SupportsIndex = ...) -> Literal[str]: ...  # Biasing to overly restrictive: '__old'
if sys.version_info >= (3, 9):
    def removeprefix(self: Literal[str], __prefix: Literal[str]) -> Literal[str]: ...  # Biasing to overly restrictive: '__prefix'
    def removesuffix(self: Literal[str], __suffix: Literal[str]) -> Literal[str]: ...  # Biasing to overly restrictive: '__suffix'
def rjust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ...
def rpartition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ...  # Biasing to overly restrictive: '__sep'
def rsplit(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ...  # Biasing to overly restrictive: 'sep'
def rstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ...  # Biasing to overly restrictive: '__chars'
def split(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ...  #
def splitlines(self: Literal[str], keepends: bool = ...) -> list[Literal[str]]: ...
def strip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ...  #
def swapcase(self: Literal[str]) -> Literal[str]: ...
def title(self: Literal[str]) -> Literal[str]: ...
def upper(self: Literal[str]) -> Literal[str]: ...
def zfill(self: Literal[str], __width: SupportsIndex) -> Literal[str]: ...
def __add__(self: Literal[str], __s: Literal[str]) -> Literal[str]: ...
def __iter__(self: Literal[str]) -> Iterator[str]: ...
def __mod__(self: Literal[str], __x: Union[Literal[str], Tuple[Literal[str], ...]]) -> str: ...
def __mul__(self: Literal[str], __n: SupportsIndex) -> Literal[str]: ...
def __repr__(self: Literal[str]) -> Literal[str]: ...
def __rmul__(self: Literal[str], n: SupportsIndex) -> Literal[str]: ...
def __str__(self: Literal[str]) -> Literal[str]: ...
def __getnewargs__(self: Literal[str]) -> tuple[Literal[str]]: ...

Explicitly ruing out override

# Can only be made safe if `Literal[int]` were supported for table
def translate(self, __table: Mapping[int, int | str | None] | Sequence[int | str | None]) -> str: ...
# Allows selecting arbitrary strs (constrained by the contents of self)if given arbitrary index / slice
def __getitem__(self: Literal[str], __i: SupportsIndex | slice) -> str: ...

Override doesn't make sense (ie. doesn't return str). Note that some of these could make sense for Literal[int], Literal[bytes], etc. if we choose to support those.

def count(self, x: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ...
def encode(self, encoding: str = ..., errors: str = ...) -> bytes: ...
def endswith(
    self, __suffix: str | Tuple[str, ...], __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...
) -> bool: ...
def find(self, __sub: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ...
def index(self, __sub: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ...
def isalnum(self) -> bool: ...
def isalpha(self) -> bool: ...
if sys.version_info >= (3, 7):
    def isascii(self) -> bool: ...
def isdecimal(self) -> bool: ...
def isdigit(self) -> bool: ...
def isidentifier(self) -> bool: ...
def islower(self) -> bool: ...
def isnumeric(self) -> bool: ...
def isprintable(self) -> bool: ...
def isspace(self) -> bool: ...
def istitle(self) -> bool: ...
def isupper(self) -> bool: ...
def rfind(self, __sub: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ...
def rindex(self, __sub: str, __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...) -> int: ...
def startswith(
    self, __prefix: str | Tuple[str, ...], __start: SupportsIndex | None = ..., __end: SupportsIndex | None = ...
) -> bool: ...
@staticmethod
@overload
def maketrans(__x: dict[int, _T] | dict[str, _T] | dict[str | int, _T]) -> dict[int, _T]: ...
@staticmethod
@overload
def maketrans(__x: str, __y: str, __z: str | None = ...) -> dict[int, int | None]: ...
def __contains__(self, __o: str) -> bool: ...  # type: ignore[override]
def __eq__(self, __x: object) -> bool: ...
def __ge__(self, __x: str) -> bool: ...
def __gt__(self, __x: str) -> bool: ...
def __hash__(self) -> int: ...
def __le__(self, __x: str) -> bool: ...
def __len__(self) -> int: ...
def __lt__(self, __x: str) -> bool: ...
def __ne__(self, __x: object) -> bool: ...

@pradeep90 perhaps we incorporate the suggested overrides from the first section as an appendix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a concrete suggestion, but as a typeshed maintainer I'd have to come up with some rules, so I'd appreciate if the PEP offered more guidance on when exactly a function should use Literal[str] annotations.

@gbleaney Thanks for classifying the str methods.

@JelleZijlstra For now, I've removed the paragraph about typeshed and other str methods. We need to discuss some tradeoffs for those scenarios. Basically, adding Literal[str] overload to a method in str would affect any class that subclasses str - users could see spurious override errors (snippet) and could violate type safety by returning a value of non-literal type. I'll make that a separate PR since we might indeed decide not to change str in typeshed.

Does that unblock this initial PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, let's just land the PEP first. I have some concerns about how this would interact with typeshed but that's probably better discussed elsewhere.


::

def execute(self, sql: Literal[str], parameters: Iterable[str] = ...) -> Cursor: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would you make this change? Would typeshed's stubs for sqlite change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we would have to update its typeshed stub. Linked to it.

+ Add the draft PEP.
+ Turn unicode quotes to ASCII.
@NotWearingPants
Copy link

perfectly distinguishing good and bad queries reduces to the halting problem

@pradeep90 I'd love to see a proof of this, can you provide the reference?

@gbleaney
Copy link
Contributor

@NotWearingPants we don't have a formalized proof for you, only the intuitive comparison: Similarly to how pathological program can decide whether or not to halt based on what the analyzing program will predict, a pathological program could decide to perform a malicious operation or not based on whether or not the analyzing program predicted it would. Perhaps we should have just invoked Rice's Theorem, or made a less strong statement. The key is just that it's impossible to know for some programs if they are going to do something malicious or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants