Open
Description
This is a proposal to include a table, either as clarification,
or (my current preference) even as a full replacement for describing:
- Percent encode sets
- Valid vs invalid individual code points per component, and
- Error correction behaviour of the above,
Within a single small-ish table.
For each component of an URL that contains a percent encoded string,
we can describe per codepoint its validity, error correction and encoding.
A single code point is either:
- v: Valid and included verbatim in the output URL.
- E: (Escape) valid but nonetheless percent encoded.
- T: (Tolerate) invalid, but nonetheless left untouched by the parser —resulting in an invalid URL as output.
- F: (Fixed) invalid and fixed by the parser (and setters) by percent encoding the occurrence.
- R: (Reject) Invalid and causing a hard error, so that they do not end up in output URLs.

Notes:
- 'Other control' here is control-c0 ∪ del-c1 ∪ surrogate ∪ non-char
- The apostrophe in the query is special cased for 'non-special' URLs where it is left untouched (ie. v: Valid) hence the superscript. Special query could also be broken out into a separate column.
As of #607, the image above is slightly out of date, see my 'living' version here.
I would like to thank @LEW21 for the idea to chart things out in this way in #379
Metadata
Metadata
Assignees
Labels
No labels