Skip to content

Fix MDQ Endpoint Behavior for EntityIDs with .xml or Trailing Slash #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

CzNorbi
Copy link

@CzNorbi CzNorbi commented Jun 20, 2025

Problem

In PyFF 2.x, the MDQ handler attempts to parse the URL path and remove the extensions (like .xml or .json) under the assumption that these are used to indicate the desired response format.

However, in some cases, clients request metadata using fully encoded entityIDs like:

/entities/https%3A%2F%2Fidp.example.org.xml

In this case, the .xml is part of the actual entityID. PyFF would remove this suffix and attempt to resolve https://idp.example.org, which does not exist in the metadata. The result is an empty EntitiesDescriptor in XML responses or an empty list in JSON.

Solution

This patch modifies the _d() function to:

Only strip .xml or .json suffixes if the remaining path does not appear to be a percent-encoded entityID or a hash-based entityID ({sha1}, {sha256}, {md5}).
If the entityID appears to be encoded or hashed and ends in .xml or .json, it is treated as part of the true entityID and preserved during lookup.

Additional fix: Preserves a trailing / if present in the request path.

@andrewbcoyle
Copy link

andrewbcoyle commented Jun 24, 2025

I really appreciate everyone taking a look at this.

FWIW, I do not consider in-line comments in source code "easy to find documentation". I don't consider in-line comments documentation at all. One of the major challenges my team has found troubleshooting this is the only thorough documentation that appears to exist for e.g. "How to customize a pipeline", only exist as in-line comments for each available function. Compare that to, say, Kubernetes documentation.

I can see how that would be enough for the folks who wrote/maintain the code, but for an outsider who is needing to discover how all this works in order to fix something, it is effectively useless. Just finding the relevant documentation is half the battle.

@theseal
Copy link
Contributor

theseal commented Jun 25, 2025

Will not the content_negotiation_policy be enough for solving the content-type issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants