JSON-Schema-Test-Suite: $vocabulary tests are incorrectly required

The vocabulary tests here are not strictly correct according to the spec, unless I’m missing something. They appear to assert that if the validation vocabulary isn’t present, that an implementation must mark instances as valid even if the validation vocabulary says they are not – but that’s not the specified (or intended) behavior of $vocabulary as far as I know. Quoting §6.5:

Additional schema keywords and schema vocabularies MAY be defined by any entity. Save for explicit agreement, schema authors SHALL NOT expect these additional keywords and vocabularies to be supported by implementations that do not explicitly document such support.

I.e. a schema author may not depend on support for a keyword or vocabulary they use but do not place in $vocabulary, but an implementation may indeed offer support for it and enable it, either by always enabling the vocabulary or because it has chosen to add a keyword called “minimum” whose behavior is precisely the same as the validation vocabulary’s, and then enable it by default regardless of what’s in $vocabulary.

When the $vocabulary keyword does have mandatory effect is in the converse – where an implementation lacks support for a vocabulary and a schema author requires its use, the implementation may not ignore those keywords:

The values of the object properties MUST be booleans. If the value is true, then implementations that do not recognize the vocabulary MUST refuse to process any schemas that declare this meta-schema with “$schema”. If the value is false, implementations that do not recognize the vocabulary SHOULD proceed with processing such schemas. The value has no impact if the implementation understands the vocabulary.

from §8.1.2.

TL;DR, an implementation given this schema:

{
            "$id": "https://schema/using/no/validation",
            "$schema": "http://localhost:1234/draft2020-12/metaschema-no-validation.json",
            "properties": {
                "badProperty": false,
                "numberProperty": {
                    "minimum": 10
                }
            }
}

(with metaschema here) is indeed free to still apply the validation vocabulary, or to similarly define some behavior for the minimum keyword which makes instances like 20 be invalid.

In “today’s test layout”, the above means that these tests belong in optional, though given we have #561 on hold pending restructuring the optional/ directory, perhaps we instead should remove them and do the same with these?

CC @handrews (since I believe you confirmed the above interpretation previously, but just making sure) and @karenetheridge (since you added these looks like, in case you disagree).

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 1
Comments: 23 (23 by maintainers)

Most upvoted comments

Okay… I think I see what’s going on here.

We have a (currently required) test:

// schema
{
  "$id": "https://schema/using/no/validation",
  "$schema": "http://localhost:1234/draft2020-12/metaschema-no-validation.json",
  "properties": {
    "badProperty": false,
    "numberProperty": {
      "minimum": 10
    }
  }
}

// instance
{
  "numberProperty": 1
}

The expected outcome from this test is that since the validation vocab is explicitly excluded, minimum is rendered an unknown keyword and thus ignored, meaning that this instance passes (perhaps counterintuitively).

@Julian is saying that the specification allows (via not explicitly disallowing) that a JSON Schema implementation may have an internal, always-on implementation of minimum that functions identically to what the omitted vocab defines, and that such an implementation would fail this instance, thus failing the test, labelling it as non-conformant. However because there is no language that explicitly forbids an implementation from doing this, the inclusion of an always-on minimum should be permitted, meaning that this test should be optional (if not removed entirely), and we should allow such implementations to declare conformance.

I would say that writing a meta-schema that explicitly excludes a vocab is a very intentional act, and an implementation that doesn’t allow an author to do this (as its default behavior) should be considered non-conformant.

The test suite should cover only what is required by the spec. So to show that this is in fact a requirement…

From the opening statement for section 8.1.2,

The “$vocabulary” keyword is used in meta-schemas to identify the vocabularies available for use in schemas described by that meta-schema.

it seems clear there exists an implication that keywords defined in a vocab that is not listed in $vocabulary are not considered “available for use.” This MUST render keywords in exluded vocabs as unknown or unrecognized. This, to me, is sufficient to mean that omitting a vocabulary implies that its keywords are to be ignored. This is an implicit requirement of the specification.

Consider my data or unique-keys vocabs. If I use the 2020-12 meta-schema, which doesn’t list either of these vocabs, it is expected that an implementation ignore these keywords, even if it understands them.

Similarly, if a meta-schema excludes the validation vocab, it must be expected that an implementation ignore those keywords.

The explicit exclusion of a vocabulary implies that its keywords MUST be ignored.

This is not to say that an implementation can’t be configured so that a keyword from an excluded vocab is “always-on.” But it does mean that this cannot be the default behavior.

gregsdennis on Aug 15, 2022

There’s a lot going on this this issue, and I’ve completely lost track of who’s arguing for what.

That said, if someome created a meta-schema that didn’t include the validation vocab and then included type in a schema that uses that meta-schema, my implementation would ignore type.

If that meta-schema also required (value of true) a custom vocabulary that defined type, my implementation would need a type keyword implementation that is defined for that custom vocabulary.

In short, $vocabulary defines the keywords that are usable by the schema. If a vocab is absent, then those keywords aren’t usable. The caveat is the core vocab, which (per 8.1.2.1) is implied when absent (unless that entire section is subject to the intro “if $vocabulary is absent…”).

I think the takeaway is that the language could be better. Personally I like the idea that vocabs declare keywords that are available for use, which implies keywords defined by missing vocabs are unavailable.

gregsdennis on Aug 15, 2022

Determining where these test cases go involves determining two things:

Does the specification require that vocabularies not present in $vocabulary not be used?
Does the specification allow an always-on non-vocabulary extension keyword to take the place of a keyword from a vocabulary that is not in use?

What is `$vocabulary` actually specified to do?

This is stated most clearly in two places:

8.1. Meta-Schemas and Vocabularies

Two concepts, meta-schemas and vocabularies, are used to inform an implementation how to interpret a schema. Every schema has a meta-schema, which can be declared using the “$schema” keyword.

The meta-schema serves two purposes:

Declaring the vocabularies in use The “$vocabulary” keyword, when it appears in a meta-schema, declares which vocabularies are available to be used in schemas that refer to that meta-schema. Vocabularies define keyword semantics, as well as their general syntax.

and then:

8.1.2. The “$vocabulary” Keyword

The “$vocabulary” keyword is used in meta-schemas to identify the vocabularies available for use in schemas described by that meta-schema. It is also used to indicate whether each vocabulary is required or optional, in the sense that an implementation MUST understand the required vocabularies in order to successfully process the schema. Together, this information forms a dialect. Any vocabulary that is understood by the implementation MUST be processed in a manner consistent with the semantic definitions contained within the vocabulary.

The section on default vocabularies provides some additional clues about expected behavior:

8.1.2.1. Default vocabularies

If “$vocabulary” is absent, an implementation MAY determine behavior based on the meta-schema if it is recognized from the URI value of the referring schema’s “$schema” keyword. This is how behavior (such as Hyper-Schema usage) has been recognized prior to the existence of vocabularies.

If the meta-schema, as referenced by the schema, is not recognized, or is missing, then the behavior is implementation-defined. If the implementation proceeds with processing the schema, it MUST assume the use of the core vocabulary. If the implementation is built for a specific purpose, then it SHOULD assume the use of all of the most relevant vocabularies for that purpose.

For example, an implementation that is a validator SHOULD assume the use of all vocabularies in this specification and the companion Validation specification.

What’s important here is that while there are conditions under which a can (or even SHOULD) assume vocabularies, they are only relevant if:

$vocabulary is absent,
$schema is absent or not recognized (meaning that the implementation does not know what vocabularies the meta-schema was intended to imply), and
the implementation decided to process the schema, which it is not required to do (the SHOULD is about which vocabularies to assume, not about whether to process or not — a purpose-built validator is just as free to decline to process as any other implementation)

There is nothing about assuming vocabularies when $vocabulary is present.

Another important set of requirements comes from the beginning of §8:

8. The JSON Schema Core Vocabulary

The Core vocabulary MUST be considered mandatory at all times, in order to bootstrap the processing of further vocabularies. Meta-schemas that use the “$vocabulary” (Section 8.1) keyword to declare the vocabularies in use MUST explicitly list the Core vocabulary, which MUST have a value of true indicating that it is required.

The behavior of a false value for this vocabulary (and only this vocabulary) is undefined, as is the behavior when “$vocabulary” is present but the Core vocabulary is not included. However, it is RECOMMENDED that implementations detect these cases and raise an error when they occur. It is not meaningful to declare that a meta-schema optionally uses Core.

It doesn’t make sense to mandate the presence of the core vocabulary in $vocabulary unless leaving it out means that it would not be available. If it is set to false or left out, the behavior is undefined, which implies that for any other vocabulary, the behavior is defined. That is explicitly true for the false value, and this suggests that it was assumed to be true for omitted vocabularies as well.

The RECOMMENDED approach of raising an error is because the meta-schema is otherwise saying that you can’t use the core vocabulary, which makes no sense. (Sadly, the individual vocabulary meta-schemas omit the core vocabulary, which… wtf was I thinking? It ends up being OK because they’re not really intended to be used on their own.) So there’s no reason, given this, to assume that an implementation can use the validation vocabulary if it’s omitted.

Looking over the above text, it’s unquestionable that the spec does not offer clear normative text requiring vocabularies omitted from $vocabulary to not be used. However, the phrase “identify the vocabularies available for use” hints at that by implying that other vocabularies are not available (otherwise why would we need to list the set at all?). I accept that it does not say so explicitly, but I argue that between that phrasing and the explicit statements about a few cases where vocabularies can be used without appearing $vocabulary, there is at least some ambiguity here, so how do we resolve that?

Do old issue and PR comments support my reading of `$vocabulary`?

Test suite issue #439 “Document the test inclusion guidance/criteria” includes the following guidance:

additional tests MUST NOT attempt to clarify the specification itself independently for behavior that was not considered or proscribed by the specification. In the case of ambiguous text in the specification, the specification team SHOULD be consulted to confirm what behavior was intended. If the relevant scenario was clearly and specifically considered but the wording was unclear, tests MAY be added. Otherwise, the test MUST be deferred (i.e. not added with any expected result) until a specification with explicit decision on its behavior is published.

Part of “consulting the specification team” includes what I personally intended for $vocabulary, but what is more important is whether other members of the team involved in adding the feature understood and agreed with that intent.

I think it can be solidly establshed that the set of people most involved in adding and reviewing this feature (myself, Greg, Ben, and jgonzalesdr, with others chiming in here and there) all understood $vocabulary to be the complete, rather than minimum, set of usable vocabularies in the context of a given meta-schema, with the ability to turn off the standard validation feature seen as a valuable use case:

json-schema-org/json-schema-spec#513 has an extensive (and exhausting) debate regarding using JSON Schema without the standard validation vocabulary. $vocabulary had not been specified yet, but we had the concept of modular vocabularies as a subject of discussion.
- Evgeny’s counter-argument was that the standard validation vocabulary should always be present. Several of us spent 50+ comments arguing against this, which at least establishes the idea that turning the vocabulary off somehow is seen as valuable. However, this issue avoids asserting that all implementations must be able to turn it off.
- Relequestual, dlax, erayd, Anthropic, and I all presented or agreed with use cases for omitting the standard validaiton vocabualry. Some used alternate assertion vocabularies, and others avoided assertions entirely. One comment mentions substituting a different kind of type check, although it does not say that that would be done by redefining type (the relevance of this will be clear later).
Issue json-schema-org/json-schema-spec#567 is all about whether you can or have to put the core vocabulary in $vocabulary. Most of the discussion is elsewhere (see the $vocabulary PR below), but we decided against letting implementations just use it without it being present. This decision only makes sense if omitting a vocabulary from $vocabularies means it wouldn’t be used.
In PR 671 comment, Greg observes “It sounds like $vocabulary is kind of an equivalent of the JS imports statement for schema keywords. Unless a keyword is defined within one of the “imported” meta-schemas, a validator will ignore it.” There’s also some discussion of special handling if the core vocabulary is omitted.
Here, in another PR 671 discussion with Greg I discuss what happens if the core vocabulary is omitted, saying “You can’t not use the core vocabulary- you need it to bootstrap processing (assuming you don’t just hardcode it, which you can do because it’s mandatory and I expect most people will). Basically, I expect implementations to ignore whether core is present in $vocabulary but some people will like having it there.” which hopefully establishes that I believed that omitting a vocabulary from $vocabulary would mean actually not using it, which is why the core vocab required special discussion of this.
In yet another PR 671 commment Ben states “Assuming that validation is its own vocabulary, this means that any schema document that is for validation, MUST define it is using the validation vocabulary, right?”
- This is in the context of when $vocabulary is absent, but it reflects the assumption that failing to declare a vocabualry makes it unavailable.
- In a follow-up comment, Greg notes “A vocabulary is a just set of keywords. If I want to create a validation vocabulary with a completely disjoint set of keywords, I shouldn’t be required to also include the standard set. Perhaps my vocabulary isn’t disjoint. Maybe it overlaps and redefines a number of the keywords. In that case, it would be explicitly wrong of me to include the standard set due to conflicts.” This statement makes it clear that an implementation that enabled the standard validation even if it is omitted would create a conflict with extension vocabularies. If implementations were allowed to just enable whatever they want, then the standard vocabularies would become a reserved namespace, and the spec makes clear that that is only true for core.
- Two follow-ups below that, I note that “[vocabulary] assumptions are only allowed when $vocabulary is completely absent”, referring to assumptions that a validator can make in the absence of $vocabulary, in what is now §8.1.2.1 “Default vocabularies”. The section as published establishes what assumptions are allowed under what circumstances, while this comment clarifies that other assumptions were intended to be forbidden.
There were various PR 671 discussions, including this one about the wording that, by the end of PR, became that identify the vocabularies available for use language. This particular thread was started by jgonzalesdr several days after one of the above threads, in which he participated, so folks had the context of omitted vocabularies being turned off, and none of us spotted the problem with the wording despite agreement on that aspect.

Non-schema extension keywords replicating the standard vocabularies

I think it should be pretty clear at this point that despite the lack of normative text, the active contributors on this feature all agreed that $vocabulary defines the complete set of available vocabularies, not the minimum.

So the question now is whether an implementation can define minimum (the keyword used in these test cases) as a non-vocabulary extension, with syntax and semantics identical to that of the validation vocabulary, and always have that keyword enabled regardless of what $vocabulary contains.

@Julian asserts that implementations can have always-on non-vocabulary extension keywords. This sort of extension long predates my involvement with the project, and Julian has a much deeper background with it, so I will defer to him on this point.

The remaining question is whether, in 2019-09 and 2020-12, it is sufficiently acceptable to do that with minimum (or any non-core standard vocabualry keyword). Let’s walk through this.

Any implementation running this test suite supports the validation vocabulary, by definition. It doesn’t matter if it supports it by hardwiring it or by treating it as a plug-in just like all other non-core vocabularies (this is §6.5’s distinction between “direclty supporting” or not “direclty supporting” a vocabulary).
Omitting the validation vocabulary from $vocabulary means that the implementation cannot use it.
It is not possible to detect the difference between using the vocabulary keyword and using an extension keyword that is identical in appareance and behavior to the vocabulary keyword. Since there is no requirement that vocabularies be implemented as plugins, there is no universal distinction between “this is the implementation of a vocabulary keyword” and “this is the implementaiton of a non-vocabulary extension.” The only difference is that vocabularies have a mechanism for controlling whether they are available to a schema or not.

§6.5 says “Additional schema keywords and schema vocabularies MAY be defined by any entity.” The key word here is “additional.” There are two plausible readings of “additional”:

additional beyond what the specification defines
additional beyond what the keywords from the vocabularies that$vocabulary lists

While the core specification does not depend on the validation specification, it references its vocabularies when discussing default vocabularies in §8.1.2.1, and we’ve already established that any implementation running the test suite supports the validation vocabulary anyway.

I do not see a plausible argument that an implementation that supports the standard validation vocabulary can claim that a minimum keyword that effectively duplicates the same keyword from that vocabulary is an “additional” keyword in such a way that it would invalidate these test cases.

If it is truly identical, then it is the vocabulary keyword, and not “additional” at all, as the internal code structure is irrelevant. It’s the vocabulary keyword that’s being turned on despite $vocabulary indicating that it is turned off.
If it is not identical, but is always on, then it could be “additional” but it has redefined the keyword in a non-compliant way and won’t pass the test suite anyway.
If it is not always on, then claiming that automatically enabling it if and only if the validation vocabulary is omitted from $vocabulary, are we really saying that that is a reasonable test configuration that should pass a required test?

There is no plausible reason that someone would do that other than to confound this test case. I don’t think that invalidates the test case. There are other things in the required test suite that a truly dedicated person could find a way to break and still claim compliance. I don’t think those cases should be moved to optional, and I don’t think these should either.

handrews on Aug 14, 2022

Fair enough we can leave this here. Thanks for the input folks.

Julian on Aug 17, 2022

if $vocabulary behaves in the way Henry says it does it’s weird to me that non-vocabulary keywords are allowed at all

This is because I (and I assume @gregsdennis and @Relequestual) never contemplated the possibility of someone substituting non-vocabulary keywords for standard keywords in a way that can’t be turned off. In my case, I had no idea that always-on extension keywords were considered valid when it came to assessing conformance in the first place (Greg, Ben, were either of you aware of this?).

Don’t get me wrong- it’s excellent QA thinking! From the perspective of my QA career I respect the thoroughness and reasoning. And I would have loved to have heard it 3 years and 8 months ago when we could have easily fixed the wording to make it beyond a doubt that this was not allowed, whether by exempting the standard vocabularies, or by forbidding always-on extension keywords, or whatever else.

But it’s difficult to prove now, because how do you prove that you intended to forbid something that you didn’t think was possible in the first place? (By which I mean “I did not (and do not) think it is possible to have such a minimum or type always on and considered a viable candidate for test suite conformance”, not “it’s not possible to redefine such keywords outside of vocabularies at all”)

handrews on Aug 15, 2022

@Julian what I don’t understand is why you are so dead-set on insisting that this particular configuration: an implementation that deliberately circumvents the intent of $vocabulary as its default setting through which it expects to pass the test suite is something the test suite should accommodate.

Is the test suite designed to test reasonable configurations or pathological ones? I’m defining a “pathological” implementation as one that exploits non-interoperable behavior to circumvent more clearly-defined behavior. The disabling of the validation vocabulary by omitting it from $vocabulary is clearly-defined behavior (the text is not as clear as it should have been, but as demonstrated, everyone who worked on getting that PR in had the same understanding of that text means, and it is well within the common-language meaning of that text).

That is a choice that an implementation makes to go beyond the spec, into areas that are not interoperable. §6.5 makes it quite clear that this is outside of what is “normal” JSON Schema behavior: “Save for explicit agreement…”, etc.

The test suite is correctly happy to rely on this elsewhere. The (again, correctly) required test that ensures that $id is not recognized in a location that is not known to be a schema relies on an unimplemented unknown keyword. But by the logic you’re using here, the test suite MUST assume that any unknown keyword might be implemented and permanently enabled, and done so in such a way to make involving one in a test impossible. I hesitate to bring this up because I will be just as vehemently against moving tests/draft2020-12/unknownKeyword.json out of the required suite as I am about these $vocabulary tests, but in terms of reading what the spec allows around additional keywords in the most aggressive possible way, it’s the exact same problem.

There are quite a few configurations or implementation choices that are not explicitly forbidden by the spec but that the test suite. The spec’s wording technically allows ignoring certain uses of $ref because the language around noticing an $id and “automatically” resolving a $ref to it is at most a SHOULD. Are you going to exile all of those tests to “optional”? I would certainly hope not.

Because if we go that far, then (as you and I have discussed), the core spec starts falling apart entirely. The normative language just isn’t clear enough. Tons of things in the core spec lack normative language entirely. There is no normative language around what it means to “identify” a schema with a URI.

None of this has to do with whether only vocabularies are allowed to add keywords. It has to do with how far implementations can go beyond what the spec states into murky, explicitly non-interoperable areas, and still expect to pass the test suite.

You keep saying that you don’t want the test suite to be making subjective decisions, but there is unquestionably a subjective decision going on here, where you are willing to allow the test suite to assume a reasonable configuration and set of implementation choices for some tests in the required suite, but seem dead-set on forbidding such an assumption for $vocabulary. And I cannot understand why $vocabulary is being singled out for this treatment.

handrews on Aug 15, 2022

@Julian thanks - it might be a few days before I have time to sort it out and explain (and make sure I’m actually right first).

handrews on Jul 7, 2022

Strong disagree. I will work up a more thorough response.

Ha. Quoting someone named @handrews from here 😄

[When not part of a vocabulary declared], extra keywords are still allowed and ignored, and the part of the spec he quoted indicates that. $vocabulary says what is being used, but not that only those vocabularies are being used. This was intentional to allow casually adding keywords in informal settings, without having to construct a vocabulary, assign a URI, etc.

But will wait for your more thorough response to clarify 😃

Julian on Jul 6, 2022