Skip to content

gh-131421: fix ASDL grammar for Dict to have an expr?* keys field #131419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 4, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Parser/Python.asdl
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ module Python
| UnaryOp(unaryop op, expr operand)
| Lambda(arguments args, expr body)
| IfExp(expr test, expr body, expr orelse)
| Dict(expr* keys, expr* values)
| Dict(expr?* keys, expr* values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of change makes me wonder whether we should also add '+' to indicate 1 or more items. For instance nonlocal requires at least one name, same for import, global and others.

Similarly, there is no explanation of what * and ? actually mean. While they can be inferred, I would expect a caption for them. WDYT @JelleZijlstra?

| Set(expr* elts)
| ListComp(expr elt, comprehension* generators)
| SetComp(expr elt, comprehension* generators)
Expand Down
65 changes: 38 additions & 27 deletions Parser/asdl.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# type ::= product | sum
# product ::= fields ["attributes" fields]
# fields ::= "(" { field, "," } field ")"
# field ::= TypeId ["?" | "*"] [Id]
# field ::= TypeId { "?" | "*" } [Id]
# sum ::= constructor { "|" constructor } ["attributes" fields]
# constructor ::= ConstructorId [fields]
#
Expand Down Expand Up @@ -64,34 +64,45 @@ def __init__(self, name, fields=None):
def __repr__(self):
return 'Constructor({0.name}, {0.fields})'.format(self)

class Quantifier:
class _Optional: pass
class _Sequence: pass
OPTIONAL = _Optional()
SEQUENCE = _Sequence()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be an enum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I just prefer these to have unique/singleton values (as opposed to ints or whatnot).

I can change it if its outside the expected coding style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted singletons you could also just do OPTIONAL = object() but to be honest I think an Enum/StrEnum is a better choice here since it's more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(one reason to do the class trick is to have a bit of a nicer repr(), but OTOH, I don't think we need to worry about importing enum)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth importing enum? If you want to add your suggested change I can commit it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's use an enum here, it's the right tool.


class Field(AST):
def __init__(self, type, name=None, seq=False, opt=False):
def __init__(self, type, name=None, quantifiers=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to support more than two qualifiers? and/or shouldn't we check the consistency of those qualifiers as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there are no more than two quantifiers on any type, as per my change to Python.asdl.

I'm just implementing the EBNF grammar described at the top of the asdl.py file. And I don't think this files does any amount of extra validation so far anyway.

self.type = type
self.name = name
self.seq = seq
self.opt = opt
self.seq = False
self.opt = False
self.quantifiers = quantifiers or []
if len(self.quantifiers) > 0:
self.seq = self.quantifiers[-1] is Quantifier.SEQUENCE
self.opt = self.quantifiers[-1] is Quantifier.OPTIONAL

def __str__(self):
if self.seq:
extra = "*"
elif self.opt:
extra = "?"
else:
extra = ""
extra = ""
for mod in self.quantifiers:
if mod is Quantifier.SEQUENCE:
extra += "*"
elif mod is Quantifier.OPTIONAL:
extra += "?"

return "{}{} {}".format(self.type, extra, self.name)

def __repr__(self):
if self.seq:
extra = ", seq=True"
elif self.opt:
extra = ", opt=True"
else:
extra = ""
extra = ""
for mod in self.quantifiers:
if mod is Quantifier.SEQUENCE:
extra += ", SEQUENCE"
elif mod is Quantifier.OPTIONAL:
extra += ", OPTIONAL"

if self.name is None:
return 'Field({0.type}{1})'.format(self, extra)
return 'Field({0.type}, quantifiers=[{1}])'.format(self, extra)
else:
return 'Field({0.type}, {0.name}{1})'.format(self, extra)
return 'Field({0.type}, {0.name}, quantifiers=[{1}])'.format(self, extra)

class Sum(AST):
def __init__(self, types, attributes=None):
Expand Down Expand Up @@ -314,10 +325,10 @@ def _parse_fields(self):
self._match(TokenKind.LParen)
while self.cur_token.kind == TokenKind.TypeId:
typename = self._advance()
is_seq, is_opt = self._parse_optional_field_quantifier()
quantifiers = self._parse_optional_field_quantifier()
id = (self._advance() if self.cur_token.kind in self._id_kinds
else None)
fields.append(Field(typename, id, seq=is_seq, opt=is_opt))
fields.append(Field(typename, id, quantifiers=quantifiers))
if self.cur_token.kind == TokenKind.RParen:
break
elif self.cur_token.kind == TokenKind.Comma:
Expand All @@ -339,14 +350,14 @@ def _parse_optional_attributes(self):
return None

def _parse_optional_field_quantifier(self):
is_seq, is_opt = False, False
if self.cur_token.kind == TokenKind.Asterisk:
is_seq = True
self._advance()
elif self.cur_token.kind == TokenKind.Question:
is_opt = True
quantifiers = []
while self.cur_token.kind in (TokenKind.Asterisk, TokenKind.Question):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check that we don't have ** for instance or ???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. But both ?? and ** are legitimate types, ** perhaps more legitimate than ?? is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, actually the current parser doesn't care. I thought that it would have warned the user. And should T** actually mean a list of list of T? is the obtained C code designed to handle this kind of grammar?

I have no strong opinion now on whether we should warn ??.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is essentially no change to the C code at all at this point. I don't know how this was handled before, since expr?* was always there implicitly, but this didn't seem to matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is fine as is

if self.cur_token.kind == TokenKind.Asterisk:
quantifiers.append(Quantifier.SEQUENCE)
elif self.cur_token.kind == TokenKind.Question:
quantifiers.append(Quantifier.OPTIONAL)
self._advance()
return is_seq, is_opt
return quantifiers

def _advance(self):
""" Return the value of the current token and read the next one into
Expand Down
4 changes: 2 additions & 2 deletions Python/Python-ast.c

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading