Description
TL;DR: Locating schemas by their $id
does not support common offline/local usage. Schema URIs (as in $id
, $ref
, and $dynamicRef
) don't always represent the actual location of the schema, and they don't have to (although they often do). Adding schema locators, which take precedence over identifiers, would implement the json-schema spec more thoroughly and better support offline/local usage.
The Problem
URIs are identifiers, not necessarily locators, per 8.2.3. Schema References:
The resolved URI produced by [
$ref
and$dynamicRef
] is not necessarily a network locator, only an identifier. A schema need not be downloadable from the address if it is a network-addressable URL, [...]
Relative URIs resolve to absolute URIs (identifiers, not necessarily locators), per 8.2.3.1. Direct References with "$ref":
The value of the "$ref" keyword MUST be a string which is a URI-Reference. Resolved against the current URI base, it produces the URI of the schema to apply.
This means relative and absolute URIs alike might not locate the schema to load! This occurs frequently when locally developing a schema to later publish to it's $id
URI, or when downloading a schema from it's $id
URI to use offline. In both cases, the $id
is some identifier that is NOT the location of the schema(s), maybe not to any location at all! I've found both of these cases to be very common. Granted, identifiers are very often their own locators, and this should continue to be tried when no explicit locator is available, but generally URIs can't be assumed to be locators.
Another useful assumption is that relative identifiers are equivalent relative locators. This seems to almost always be true, but a relative $ref
URI might not be a relative filename if the file structure isn't consistent with the relative URIs. This is generally a bad idea, but not actually forbidden, so relative identifier URIs can't be assumed to be relative locators - though they should still be tried if no more explicit locator is available.
This issue causes many valid schema registries to either have broken references or requires rewriting all $id
s and/or $ref
s to be locators for the local file system (and incorrect locators for their published locations). My team manages ~200 schemas for online and offline use across ~20 registries, so neither option is viable.
Some Solutions
📓 Note: I use angle brackets here to substitute controlled information
Example Setup
Using the example interface.json
schema:
{
"$id": "<gitlab-project-url>/-/raw/<tag>/schema/interface.json",
"properties": {
"protocol": {"$ref": "protocol/ethernet.json"},
}
}
... located in:
/home/coder/<path-to-project>/schema/
├── interface.json
└── protocol
└── ethernet.json
... and referenced in the VSCode config:
{
"json.schemas": [
{
"url": "/home/coder/<path-to-project>/schema/interface.json"
}
]
}
Good: Let Locators and Identifiers be Distinct
A quick and easy partial solution would be to try to resolve the relative URI the schema's actual location (locator) in addition to it's $id
base URI (identifier). For the example, that means we would check both:
/home/coder/<path-to-project>/schema/protocol/ethernet.json
exists! We can use it.<gitlab-project-url>/-/raw/<tag>/schema/protocol/ethernet.json
might be the wrong reference during local development, might not exist yet if at all, or might be inaccessible from an offline or unauthenticated environment.
❓ If both exist, I would suggest to resolve the URI against the actual location not the URI identifier.
Better: Also Support the Locator/Identifier Distinction in VSCode settings.json
A more thorough solution might fully support the distinction between an identifier and a locator for all schemas, and allow mappings between the two. Since the jon.schemas[*].url
is already the locator, a keyword could be added for the identifier, like:
{
"json.schemas": [
{
"url": "/home/coder/<path-to-project>/schema/",
"id": "<gitlab-project-url>/-/raw/<tag>/schema/"
}
]
}
This example means that the folder's literal location is represented by the gitlab URI. For example, a reference to <gitlab-project-url>/-/raw/<tag>/schema/protocol/ethernet.json
could be located at /home/coder/<path-to-project>/schema/schema/protocol/ethernet.json
.
As above, if the locator does not resolve the full identifier might still be tried per the existing logic.
Best: Also Let the Locator be in settings.json
and Identifier be in the Schema
Since schema files usually (or often, or should) contain their own $id
, the config could even infer some identifiers (at the cost of loading schema files to search their content). The config json.schemas[*]
could even be just a string url, vs. an object that names a url. For example, all of the below json.schemas
contain enough information to locate and identify the schemas they name.
{
"json.schemas": [
"url to schema with its own `$id`",
"url to directory of schemas with their own `$id`s",
{"url": "url to schema with its own `$id`"},
{"url": "url to directory of schemas with their own `$id`s"},
]
}
⚠ This also assumes that all the absolute $id
URIs are actually unique, and could have confusing side effects for schemas with a relative $id
. However, this violates the json-schema spec, 8.2.1. The "$id" Keyword:
If present, the value for [
$id
...] MUST resolve to an absolute-URI [RFC3986] (without a fragment), or to a URI with an empty fragment.
As above, if the locator does not resolve the full identifier might still be tried per the existing logic.