-
Notifications
You must be signed in to change notification settings - Fork 151
[editorial] Rephrase encoding note to make the implications clearer. #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tabatkins
wants to merge
2
commits into
whatwg:main
Choose a base branch
from
tabatkins:make-note-clearer
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for not responding to this more quickly, but I think I never ended up merging it because I'm not sure this is correct. I suspect one could convert at serialization time instead. It's just not how the specification is written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the spec could be written another way (potentially), but it's currently not written that way, and the specifics of how the data is encoded/represented at this point in the spec are important, so I know that the URL structure only includes ASCII code points. If we changed to a "convert at serialization" model, that would also be important to note, so it was clear that the URL structure includes non-ASCII code points.
As I said, the nature of this note actively confused me - the spec talks about "URL code points" including non-ASCII codepoints, but URLs themselves do not contain these code points, and that wasn't clear to me from how the note was written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. 1) It's not for "historical reasons". 2) This section is really about writing URLs, it isn't really about their internal representation at all. That's section 4.1 and that already makes it clear most components are ASCII strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "for historical reasons" was me assuming and editorializing. (It seemed like a weird thing to do! It's not usually good practice to encode into the byte format immediately; usually you hold it in the good data model and only encode at the edges, when you have to hit the wire.) If that's not true, and it really is just a quirk of the model, I can rephrase that bit.
And this section is about writing URLs, sure, but there was already a note about how those codepoints you write will be encoded. I was just rewriting the note for (imo) better clarity. If there's a better place to make this note, I can move it there, but this section does seem relatively germane to what the note is saying (since URLs can "contain" high codepoints, but the actual internal representation is ASCII-only).