JSON-Schema-Test-Suite: $vocabulary tests are incorrectly required
The vocabulary tests here are not strictly correct according to the spec, unless I’m missing something. They appear to assert that if the validation vocabulary isn’t present, that an implementation must mark instances as valid even if the validation vocabulary says they are not – but that’s not the specified (or intended) behavior of $vocabulary
as far as I know. Quoting §6.5:
Additional schema keywords and schema vocabularies MAY be defined by any entity. Save for explicit agreement, schema authors SHALL NOT expect these additional keywords and vocabularies to be supported by implementations that do not explicitly document such support.
I.e. a schema author may not depend on support for a keyword or vocabulary they use but do not place in $vocabulary
, but an implementation may indeed offer support for it and enable it, either by always enabling the vocabulary or because it has chosen to add a keyword called “minimum” whose behavior is precisely the same as the validation vocabulary’s, and then enable it by default regardless of what’s in $vocabulary
.
When the $vocabulary
keyword does have mandatory effect is in the converse – where an implementation lacks support for a vocabulary and a schema author requires its use, the implementation may not ignore those keywords:
The values of the object properties MUST be booleans. If the value is true, then implementations that do not recognize the vocabulary MUST refuse to process any schemas that declare this meta-schema with “$schema”. If the value is false, implementations that do not recognize the vocabulary SHOULD proceed with processing such schemas. The value has no impact if the implementation understands the vocabulary.
from §8.1.2.
TL;DR, an implementation given this schema:
{
"$id": "https://schema/using/no/validation",
"$schema": "http://localhost:1234/draft2020-12/metaschema-no-validation.json",
"properties": {
"badProperty": false,
"numberProperty": {
"minimum": 10
}
}
}
(with metaschema here) is indeed free to still apply the validation vocabulary, or to similarly define some behavior for the minimum
keyword which makes instances like 20
be invalid.
In “today’s test layout”, the above means that these tests belong in optional
, though given we have #561 on hold pending restructuring the optional/
directory, perhaps we instead should remove them and do the same with these?
CC @handrews (since I believe you confirmed the above interpretation previously, but just making sure) and @karenetheridge (since you added these looks like, in case you disagree).
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 23 (23 by maintainers)
Okay… I think I see what’s going on here.
We have a (currently required) test:
The expected outcome from this test is that since the validation vocab is explicitly excluded,
minimum
is rendered an unknown keyword and thus ignored, meaning that this instance passes (perhaps counterintuitively).@Julian is saying that the specification allows (via not explicitly disallowing) that a JSON Schema implementation may have an internal, always-on implementation of
minimum
that functions identically to what the omitted vocab defines, and that such an implementation would fail this instance, thus failing the test, labelling it as non-conformant. However because there is no language that explicitly forbids an implementation from doing this, the inclusion of an always-onminimum
should be permitted, meaning that this test should be optional (if not removed entirely), and we should allow such implementations to declare conformance.I would say that writing a meta-schema that explicitly excludes a vocab is a very intentional act, and an implementation that doesn’t allow an author to do this (as its default behavior) should be considered non-conformant.
The test suite should cover only what is required by the spec. So to show that this is in fact a requirement…
From the opening statement for section 8.1.2,
it seems clear there exists an implication that keywords defined in a vocab that is not listed in
$vocabulary
are not considered “available for use.” This MUST render keywords in exluded vocabs as unknown or unrecognized. This, to me, is sufficient to mean that omitting a vocabulary implies that its keywords are to be ignored. This is an implicit requirement of the specification.Consider my
data
orunique-keys
vocabs. If I use the 2020-12 meta-schema, which doesn’t list either of these vocabs, it is expected that an implementation ignore these keywords, even if it understands them.Similarly, if a meta-schema excludes the validation vocab, it must be expected that an implementation ignore those keywords.
The explicit exclusion of a vocabulary implies that its keywords MUST be ignored.
This is not to say that an implementation can’t be configured so that a keyword from an excluded vocab is “always-on.” But it does mean that this cannot be the default behavior.
There’s a lot going on this this issue, and I’ve completely lost track of who’s arguing for what.
That said, if someome created a meta-schema that didn’t include the validation vocab and then included
type
in a schema that uses that meta-schema, my implementation would ignoretype
.If that meta-schema also required (value of
true
) a custom vocabulary that definedtype
, my implementation would need atype
keyword implementation that is defined for that custom vocabulary.In short,
$vocabulary
defines the keywords that are usable by the schema. If a vocab is absent, then those keywords aren’t usable. The caveat is the core vocab, which (per 8.1.2.1) is implied when absent (unless that entire section is subject to the intro “if$vocabulary
is absent…”).I think the takeaway is that the language could be better. Personally I like the idea that vocabs declare keywords that are available for use, which implies keywords defined by missing vocabs are unavailable.
Determining where these test cases go involves determining two things:
$vocabulary
not be used?What is
$vocabulary
actually specified to do?This is stated most clearly in two places:
and then:
The section on default vocabularies provides some additional clues about expected behavior:
What’s important here is that while there are conditions under which a can (or even SHOULD) assume vocabularies, they are only relevant if:
$vocabulary
is absent,$schema
is absent or not recognized (meaning that the implementation does not know what vocabularies the meta-schema was intended to imply), andThere is nothing about assuming vocabularies when
$vocabulary
is present.Another important set of requirements comes from the beginning of §8:
It doesn’t make sense to mandate the presence of the core vocabulary in
$vocabulary
unless leaving it out means that it would not be available. If it is set tofalse
or left out, the behavior is undefined, which implies that for any other vocabulary, the behavior is defined. That is explicitly true for thefalse
value, and this suggests that it was assumed to be true for omitted vocabularies as well.The RECOMMENDED approach of raising an error is because the meta-schema is otherwise saying that you can’t use the core vocabulary, which makes no sense. (Sadly, the individual vocabulary meta-schemas omit the core vocabulary, which… wtf was I thinking? It ends up being OK because they’re not really intended to be used on their own.) So there’s no reason, given this, to assume that an implementation can use the validation vocabulary if it’s omitted.
Looking over the above text, it’s unquestionable that the spec does not offer clear normative text requiring vocabularies omitted from
$vocabulary
to not be used. However, the phrase “identify the vocabularies available for use” hints at that by implying that other vocabularies are not available (otherwise why would we need to list the set at all?). I accept that it does not say so explicitly, but I argue that between that phrasing and the explicit statements about a few cases where vocabularies can be used without appearing$vocabulary
, there is at least some ambiguity here, so how do we resolve that?Do old issue and PR comments support my reading of
$vocabulary
?Test suite issue #439 “Document the test inclusion guidance/criteria” includes the following guidance:
Part of “consulting the specification team” includes what I personally intended for
$vocabulary
, but what is more important is whether other members of the team involved in adding the feature understood and agreed with that intent.I think it can be solidly establshed that the set of people most involved in adding and reviewing this feature (myself, Greg, Ben, and jgonzalesdr, with others chiming in here and there) all understood
$vocabulary
to be the complete, rather than minimum, set of usable vocabularies in the context of a given meta-schema, with the ability to turn off the standard validation feature seen as a valuable use case:json-schema-org/json-schema-spec#513 has an extensive (and exhausting) debate regarding using JSON Schema without the standard validation vocabulary.
$vocabulary
had not been specified yet, but we had the concept of modular vocabularies as a subject of discussion.type
(the relevance of this will be clear later).Issue json-schema-org/json-schema-spec#567 is all about whether you can or have to put the core vocabulary in
$vocabulary
. Most of the discussion is elsewhere (see the$vocabulary
PR below), but we decided against letting implementations just use it without it being present. This decision only makes sense if omitting a vocabulary from$vocabularies
means it wouldn’t be used.In PR 671 comment, Greg observes “It sounds like
$vocabulary
is kind of an equivalent of the JS imports statement for schema keywords. Unless a keyword is defined within one of the “imported” meta-schemas, a validator will ignore it.” There’s also some discussion of special handling if the core vocabulary is omitted.Here, in another PR 671 discussion with Greg I discuss what happens if the core vocabulary is omitted, saying “You can’t not use the core vocabulary- you need it to bootstrap processing (assuming you don’t just hardcode it, which you can do because it’s mandatory and I expect most people will). Basically, I expect implementations to ignore whether core is present in
$vocabulary
but some people will like having it there.” which hopefully establishes that I believed that omitting a vocabulary from$vocabulary
would mean actually not using it, which is why the core vocab required special discussion of this.In yet another PR 671 commment Ben states “Assuming that validation is its own vocabulary, this means that any schema document that is for validation, MUST define it is using the validation vocabulary, right?”
$vocabulary
is absent, but it reflects the assumption that failing to declare a vocabualry makes it unavailable.$vocabulary
, in what is now §8.1.2.1 “Default vocabularies”. The section as published establishes what assumptions are allowed under what circumstances, while this comment clarifies that other assumptions were intended to be forbidden.There were various PR 671 discussions, including this one about the wording that, by the end of PR, became that identify the vocabularies available for use language. This particular thread was started by jgonzalesdr several days after one of the above threads, in which he participated, so folks had the context of omitted vocabularies being turned off, and none of us spotted the problem with the wording despite agreement on that aspect.
Non-schema extension keywords replicating the standard vocabularies
I think it should be pretty clear at this point that despite the lack of normative text, the active contributors on this feature all agreed that
$vocabulary
defines the complete set of available vocabularies, not the minimum.So the question now is whether an implementation can define
minimum
(the keyword used in these test cases) as a non-vocabulary extension, with syntax and semantics identical to that of the validation vocabulary, and always have that keyword enabled regardless of what$vocabulary
contains.@Julian asserts that implementations can have always-on non-vocabulary extension keywords. This sort of extension long predates my involvement with the project, and Julian has a much deeper background with it, so I will defer to him on this point.
The remaining question is whether, in 2019-09 and 2020-12, it is sufficiently acceptable to do that with
minimum
(or any non-core standard vocabualry keyword). Let’s walk through this.$vocabulary
means that the implementation cannot use it.§6.5 says “Additional schema keywords and schema vocabularies MAY be defined by any entity.” The key word here is “additional.” There are two plausible readings of “additional”:
$vocabulary
listsWhile the core specification does not depend on the validation specification, it references its vocabularies when discussing default vocabularies in §8.1.2.1, and we’ve already established that any implementation running the test suite supports the validation vocabulary anyway.
I do not see a plausible argument that an implementation that supports the standard validation vocabulary can claim that a
minimum
keyword that effectively duplicates the same keyword from that vocabulary is an “additional” keyword in such a way that it would invalidate these test cases.$vocabulary
indicating that it is turned off.$vocabulary
, are we really saying that that is a reasonable test configuration that should pass a required test?There is no plausible reason that someone would do that other than to confound this test case. I don’t think that invalidates the test case. There are other things in the required test suite that a truly dedicated person could find a way to break and still claim compliance. I don’t think those cases should be moved to optional, and I don’t think these should either.
Fair enough we can leave this here. Thanks for the input folks.
This is because I (and I assume @gregsdennis and @Relequestual) never contemplated the possibility of someone substituting non-vocabulary keywords for standard keywords in a way that can’t be turned off. In my case, I had no idea that always-on extension keywords were considered valid when it came to assessing conformance in the first place (Greg, Ben, were either of you aware of this?).
Don’t get me wrong- it’s excellent QA thinking! From the perspective of my QA career I respect the thoroughness and reasoning. And I would have loved to have heard it 3 years and 8 months ago when we could have easily fixed the wording to make it beyond a doubt that this was not allowed, whether by exempting the standard vocabularies, or by forbidding always-on extension keywords, or whatever else.
But it’s difficult to prove now, because how do you prove that you intended to forbid something that you didn’t think was possible in the first place? (By which I mean “I did not (and do not) think it is possible to have such a
minimum
ortype
always on and considered a viable candidate for test suite conformance”, not “it’s not possible to redefine such keywords outside of vocabularies at all”)@Julian what I don’t understand is why you are so dead-set on insisting that this particular configuration: an implementation that deliberately circumvents the intent of
$vocabulary
as its default setting through which it expects to pass the test suite is something the test suite should accommodate.Is the test suite designed to test reasonable configurations or pathological ones? I’m defining a “pathological” implementation as one that exploits non-interoperable behavior to circumvent more clearly-defined behavior. The disabling of the validation vocabulary by omitting it from
$vocabulary
is clearly-defined behavior (the text is not as clear as it should have been, but as demonstrated, everyone who worked on getting that PR in had the same understanding of that text means, and it is well within the common-language meaning of that text).That is a choice that an implementation makes to go beyond the spec, into areas that are not interoperable. §6.5 makes it quite clear that this is outside of what is “normal” JSON Schema behavior: “Save for explicit agreement…”, etc.
The test suite is correctly happy to rely on this elsewhere. The (again, correctly) required test that ensures that
$id
is not recognized in a location that is not known to be a schema relies on an unimplemented unknown keyword. But by the logic you’re using here, the test suite MUST assume that any unknown keyword might be implemented and permanently enabled, and done so in such a way to make involving one in a test impossible. I hesitate to bring this up because I will be just as vehemently against movingtests/draft2020-12/unknownKeyword.json
out of the required suite as I am about these$vocabulary
tests, but in terms of reading what the spec allows around additional keywords in the most aggressive possible way, it’s the exact same problem.There are quite a few configurations or implementation choices that are not explicitly forbidden by the spec but that the test suite. The spec’s wording technically allows ignoring certain uses of
$ref
because the language around noticing an$id
and “automatically” resolving a$ref
to it is at most a SHOULD. Are you going to exile all of those tests to “optional”? I would certainly hope not.Because if we go that far, then (as you and I have discussed), the core spec starts falling apart entirely. The normative language just isn’t clear enough. Tons of things in the core spec lack normative language entirely. There is no normative language around what it means to “identify” a schema with a URI.
None of this has to do with whether only vocabularies are allowed to add keywords. It has to do with how far implementations can go beyond what the spec states into murky, explicitly non-interoperable areas, and still expect to pass the test suite.
You keep saying that you don’t want the test suite to be making subjective decisions, but there is unquestionably a subjective decision going on here, where you are willing to allow the test suite to assume a reasonable configuration and set of implementation choices for some tests in the required suite, but seem dead-set on forbidding such an assumption for
$vocabulary
. And I cannot understand why$vocabulary
is being singled out for this treatment.@Julian thanks - it might be a few days before I have time to sort it out and explain (and make sure I’m actually right first).
Ha. Quoting someone named @handrews from here 😄
But will wait for your more thorough response to clarify 😃