Skip to content

linkcheck: add HyperlinkCollector.find_url() #12213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 24, 2024
Merged

linkcheck: add HyperlinkCollector.find_url() #12213

merged 4 commits into from
Apr 24, 2024

Conversation

jdknight
Copy link
Contributor

Feature or Bugfix

  • Refactoring

Purpose

The following tweaks the linkcheck builder's HyperlinkCollector transform to process links with an add_uri method on the transform (over an internal function inside the implementation). This can allow extensions to derive and register additional transforms using HyperlinkCollector to help inject hyperlinks into CheckExternalLinksBuilder for custom node types that may have a hyperlink value to check.

Details

While extensions can already override HyperlinkCollector to inject custom links, it needs to replicate some of the logic used to add a href value as a Hyperlink into a builder. This causes some duplication with emitting an event, line number finding and duplicate link checking. If the add_uri is available on the transform, extensions only need to care about finding the application refnode and href value.

Relates

@picnixz
Copy link
Member

picnixz commented Apr 4, 2024

I'm fine the idea but I would suggest the following:

  • Have a dedicated method for retrieving the URI from the node (i.e., find_uri(self, node) -> str).
  • Make add_uri a protected method (add a comment for that) and call it.

That way, you do not need to reimplement the whole method if you only need to find the URI from the node.

@jdknight
Copy link
Contributor Author

jdknight commented Apr 6, 2024

@picnixz, just to make sure I understand your suggestions -- are you looking for something more like this?

class HyperlinkCollector(SphinxPostTransform):
    ...

    def run(self, **kwargs: Any) -> None:
        for node in self.document.findall():
            uri = self.find_uri(node)
            if uri:
                self._add_uri(uri, node)

    def find_uri(self, node: nodes.Element) -> Optional[str]:
        # reference nodes
        if isinstance(node, nodes.reference):
            if 'refuri' in node:
                return refnode['refuri']

        # image nodes
        if isinstance(node, nodes.image):
            uri = node['candidates'].get('?')
            if uri and '://' in uri:
                return uri

        # raw nodes
        if isinstance(node, nodes.raw):
            uri = node.get('source')
            if uri and '://' in uri:
                return uri

        return None

    def _add_uri(self, uri: str, node: nodes.Element) -> None:
        ...

@picnixz
Copy link
Member

picnixz commented Apr 6, 2024

Yes, because it decouples the logic of what we already do and for extension developers, they can either subclass this method to extend the behaviour, possibly handling the images/raw nodes before and returning before as well (I think that's what you originally wanted, right?). Now, I won't be available for the next 2-3 weeks (or very sparsly), so I won't be able to review your PR until then.

The following tweaks the linkcheck builder's `HyperlinkCollector`
transform to process links with an `add_uri` method on the transform
(over an internal function inside the implementation). This can allow
extensions to derive and register additional transforms using
`HyperlinkCollector` to help inject hyperlinks into
`CheckExternalLinksBuilder` for custom node types that may have a
hyperlink value to check.

Signed-off-by: James Knight <[email protected]>
@AA-Turner AA-Turner added type:enhancement enhance or introduce a new feature builder:linkcheck labels Apr 24, 2024
@AA-Turner AA-Turner changed the title linkcheck: expose hyperlink collection add-url call linkcheck: add HyperlinkCollector.find_url() Apr 24, 2024
@AA-Turner AA-Turner merged commit 04b229f into sphinx-doc:master Apr 24, 2024
@jdknight jdknight deleted the extendable-linkcheck-transform branch May 11, 2024 23:59
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 11, 2024
@AA-Turner AA-Turner added this to the 7.4.0 milestone Jul 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
builder:linkcheck type:enhancement enhance or introduce a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants