Skip to content

Add reStructuredText parsing functions to SphinxDirective #12492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 2, 2024

Conversation

AA-Turner
Copy link
Member

As was colourfully espoused in #8039, it is harder than it ought to be to parse reStructuredText into nodes in a sphinx directive.

This PR adds three new functions:

  • SphinxDirective.parse_content_to_nodes(), to parse the entirety of SphinxDirective.content
  • SphinxDirective.parse_text_to_nodes(), to parse a given string
  • SphinxDirective.parse_inline(), to parse a text that is inline-only

Yet to finish is documentation and tests, but I am opening now for early feedback.

A

@chrisjsewell
Copy link
Member

cheers @AA-Turner love it ❤️

Haven't looked through in detail yet, but one request... can we link this to #12361, and have the new SphinxDirective methods "ask" the parser how it wants to do the parsing.

This would allow for e.g. MyST parser, or a different implementation of the rST parser to implement their own parsing logic

@chrisjsewell
Copy link
Member

chrisjsewell commented Jun 29, 2024

Oh and also, I don't know if if it was in your planning, but we could also add an inline parse method for SphinxRole

@AA-Turner AA-Turner force-pushed the directive-parse-rst/all branch from f430745 to d66ac71 Compare July 2, 2024 20:31
@AA-Turner
Copy link
Member Author

inline parse method for SphinxRole

This isn't easily feasible, having had a quick go. It would be nice, but it in effect represents nested inline parsing.

can we link this to #12361, and have the new SphinxDirective methods "ask" the parser how it wants to do the parsing.

I think this is automatic, as we use the current parser (by using the state machine directly). Very happy for someone to take this on in a follow-up though, if the current implementation is found wanting.

A

@AA-Turner AA-Turner merged commit 1887df0 into sphinx-doc:master Jul 2, 2024
23 checks passed
@AA-Turner AA-Turner deleted the directive-parse-rst/all branch July 2, 2024 21:14
@chrisjsewell
Copy link
Member

chrisjsewell commented Jul 2, 2024

I think this is automatic, as we use the current parser (by using the state machine directly)

This assumes that the parser has a state machine, which is actually just an implementation detail of docutils,
especially things like memo in _fresh_title_style_context

myst-parser has to go through absolutely hurdles to "pretend" it has one: https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/mocking.py

Very happy for someone to take this on in a follow-up though, if the current implementation is found wanting.

but yeh no problem I can do this

AA-Turner added a commit that referenced this pull request Jul 2, 2024
@chrisjsewell
Copy link
Member

@AA-Turner correct me if I'm wrong, but in this PR you have essentially changed all occurrences of self.state.nested_parse to self.parse_content_to_nodes?

But nested_parse defaulted to match_titles=False, whereas now parse_content_to_nodes uses match_titles=True.
Is this intentional? because it is a pretty big change 😬

Comment on lines +84 to +87
inliner: Inliner = state.inliner
memo: Struct = state.memo
parent: nodes.Element = state.parent
return inliner.parse(text, lineno, memo, parent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use state.inline_text(text, lineno) here, as per the code you are replacing?

Its not a hard fix, but this definitely breaks the current implementation of myst-parser

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I didn't realise -- reverted to a simpler implementation in #12504.

A

@AA-Turner
Copy link
Member Author

@AA-Turner correct me if I'm wrong, but in this PR you have essentially changed all occurrences of self.state.nested_parse to self.parse_content_to_nodes?

But nested_parse defaulted to match_titles=False, whereas now parse_content_to_nodes uses match_titles=True. Is this intentional? because it is a pretty big change 😬

See #12503, which restores the status quo ante. I do think in general that titles ought be allowed where possible, and that previously it may have more been a case of forgetting to allow them, but you make a good point that such a change should be made more deliberately.

A

@chrisjsewell
Copy link
Member

I do think in general that titles ought be allowed where possible

where possible maybe, but I do want to emphasise that you cannot nest sections inside other nodes (like admonitions), without breaking the structure of the doctree: https://gist.github.com/chrisjsewell/0c5827add50074fef0937e2543e955b4

this is also the case in other text formats like: jgm/djot#213

what you could allow is headings that are not sections; and in-fact in myst-parser, rather than just omit them, they are actually changed to rubrics, i.e. not structural headings

@AA-Turner
Copy link
Member Author

In my opinion, nested_parse_with_tiles should probably be a private function in sphinx, as it is not really intended to be used by extensions or users, unless they really know what they are doing.

Given this, do you think that the new nested_parse_to_nodes ought also be private? I'm indifferent, but having a better API for parsing arbitrary content to nodes has long been a request (see #8039 and numerous others).

A

@chrisjsewell
Copy link
Member

chrisjsewell commented Jul 2, 2024

Given this, do you think that the new nested_parse_to_nodes ought also be private?
parsing arbitrary content to nodes has long been a request

In general, when the content contains no section headings, I think its absolutely fine, and this PR makes it that bit easier 👍

Its just when it comes to trying to nested parse section headings, thats when we really need to make sure people know what they are doing, because it can get quite nuanced; since sections are tightly-coupled to the structure of the document

@AA-Turner AA-Turner added this to the 7.4.0 milestone Jul 13, 2024
2bndy5 added a commit to jbms/sphinx-immaterial that referenced this pull request Jul 16, 2024
2bndy5 added a commit to jbms/sphinx-immaterial that referenced this pull request Jul 16, 2024
See sphinx-doc/sphinx#12492

The mypy warning only applies to `sphinx>=v7.4`, but the code is only executed for `sphinx<v6.2.0`.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants