JSON-Schema-Test-Suite: Draft 2019+ tests incorrectly depend on implementations supporting `$schema`-less schemas but they are not required to process them

For example, the first test in the const file is

{
    "description": "const validation",
    "schema": {"const": 2},
    "tests": [
        {
            "description": "same value is valid",
            "data": 2,
            "valid": true
        },
        {
            "description": "another value is invalid",
            "data": 5,
            "valid": false
        },
        {
            "description": "another type is invalid",
            "data": "a",
            "valid": false
        }
    ]
}

There is no $schema declaration. That means that this is a valid draft-6, -7, and -2019-09 schema.

My implementation assumes latest, unless it can be determined that another one should be used (via keywords used or $schema declaration), but that may not be the case for others.

This means that while we intend for this to be evaluated as draft 2019-09, it may not be, depending on how the implementation reads non-specific schemas.

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 50 (50 by maintainers)

Most upvoted comments

The original question for this issue was, “Do we need $schema in the test suite cases?” not “Should $schema be required?”

After (lengthy) discussions in Slack and some time to think, I’ve concluded that no, the test cases don’t need $schema*.

Warning: reasoning ahead

Consider the schema {"type": "string"} included in the “draft 7” folder. My original point was that there is nothing stopping an implementation from running this as a 2019-09 schema.

But that doesn’t matter as long as the result is correct. My question assumes that implementations will have “Draft X Compability” modes, but that’s not the case, and it’s not required.

Basically, if my validator is given that schema and ["array"] as an instance, and it returns invalid, then it’s compliant for draft 7 for this scenario. It’s also compliant (for this scenario) with drafts 3 through 2020-12.

If it’s compliant for all test cases within a single draft folder, then it’s said to be compliant with that draft.

None of this is dependent upon $schema being present.

If, in some future version, we change the semantics of type so that the above scenario is valid, then my implementation would not be compliant with that version. But again $schema doesn’t need to be present in order for this to be determined. The fact that the test case expects a “valid” result and doesn’t get one determines that the implementation is non-compliant for this future version.

The test suite isn’t asking, “Can the implementation process a schema with meta-schema X?” (Maybe that’s a question it should ask, but if so it should do it as distinct test cases.) It’s asking, “Can the implementation give me the desired result as indicated by the specification?” That I have to configure my implementation to properly handle some test cases is outside the scope of the test suite. Secondarily, it’s my responsibility to document that such a configuration needs to be performed before processing some cases.

* Caveat

Because the 2019-09 and 2020-12 spec allow for implementations to refuse to process schemas without the $schema keyword, it’s conceivable that, in the test suite’s current state, an implementation could refuse to process the entirety of these draft folders and still claim compliance with these drafts.

This needs to be fixed by adding $schema to the schemas in these folders. The other drafts don’t need it for the reasoning above.

We will continue discussion on “expected behavior when $schema is missing for future versions” elsewhere.

gregsdennis on Jun 23, 2022

Please stop discussing. The change is accepted, it’s the correct one, we can summarize the back and forth above in an ADR if helpful for documentation (I can do that). Henry send a PR whenever you can.

Julian on Jun 23, 2022

I’m happy to download the change and run it on mine. Adding $schema will override the configuration I currently have to set, so it should give the desired effect.

gregsdennis on Jun 21, 2022

@Julian, the role of $schema has changed over time. Originally, it didn’t exist at all. In drafts 3 and 4, it was presented as an inessential nice-to-have, which made your approach reasonable. Drafts 6 and 7 didn’t really change the level of requirement around $schema, so that continued to be fine.

However, as of 2019-09, we made reliable extensibility a major goal and developed $vocabulary, which works via $schema, to accomplish that goal. As part of this, the absence of $schema in the document root was explicitly stated to result in undefined behavior.

Basic QA principles say that you can’t write test cases that combine implementation-defined behavior with positive functionality tests (for those who don’t know, my career bounced back and forth between systems QA and Development before settling into Technical Director roles spanning both departments). Doing so means that every test outcome is impacted by implementation-specific choices rather than standards conformance.

So on those grounds, if someone came to me with a test plan where nearly every test case involved an implementation-defined condition, I would reject the plan.

The larger topic here is whether the JSON Schema org supports the goal of reliable extensibility. If we do, then we really need to stop treating $schema as an afterthought. Implementation behavior around $schema is extremely variable, including implementations that ignore it entirely.

The reason for this is that the test suite makes it very clear that $schema doesn’t matter. There are no test cases for it at all. Once you get to 2019-09, there’s one $vocabulary test (that doesn’t really do much AFAICT), and in 2020-12 it’s used to enable format assertion behavior, but that’s optional anyway.

So, as currently designed, the test suite is undercutting an (apparent?) major project goal. It’s very frustrating to see how little attention many implementations pay to $schema, and it’s not a mystery as to why that has happened.

If the JSON Schema org does not have a consensus that this is a major project goal, then we should open a discussion on that in the discussions repo, because there’s no point in debating this detail without clear context on why we care about $schema. The fact that the media type registration work hinges on $schema for dialect identification suggests that the keyword is important.

If we do have consensus on this, then the topic of how to best test $schema and $vocabulary is another fairly large discussion. We would need cases where $schema is absent, but testing implementation-defined conditions is a bit of a tricky thing (the spec could be better regarding testable requirements here, I’m a little disappointed in myself TBH- for example, I should have put some testable guardrails around “implementation-defined” like “MUST NOT crash” or “MAY refuse to process the schema” or “MUST try to process the schema” etc).

But the minimum first step would be to get the implementation-defined condition out of all of the other test cases. The test suite should rely on well-specified functionality ($schema+$vocabulary determines the processing rules for 2019-09 and later) rather than on behavior that is outside of the specification entirely (relying on an external selection of processing rules).

I would prefer to extend this to all drafts, as it would significantly help normalize $schema support and encourage implementations to take it seriously. However, I would be reluctantly OK with leaving draft-07 and earlier as-is because the spec was written differently for those drafts. The behavior without $schema is not clearly specified, but it’s strongly implied to be a common occurrence and therefore handled reasonably. However, that’s intentionally no longer the case in 2019-09 and later.

handrews on Jun 18, 2022

@Julian given that we’re strongly encouraging folks to start using $schema and the $vocabulary keyword in the referenced meta-schema to control how to process schemas, we should really include this.

The idea that $schema is a needless formality has significantly degraded the usefulness of the keyword to the point that many implementations outright ignore it and just default to doing whatever they feel like regardless of what the schema declares. That’s a problem.

handrews on Nov 27, 2019

@gregsdennis I was only planning to do 2019-09 and 2020-12 in the PR, so I think we’re good here. I had noted earlier that there is no language in draft-07 or earlier that says anything at all about what happens without $schema, and therefore I did not think it was strictly necessary to add it. While I would do that if it were just me, I’m fine with leaving it as-is for those drafts.

handrews on Jun 23, 2022

I’ll post a PR in the next day or two. Got a bit distracted by some other stuff.

handrews on Jun 21, 2022

@karenetheridge

I don’t see any practical effect from adding $schema everywhere, unless you also intend to collapse all the tests together into a single directory where it is impossible to tell which version of the spec it’s written for without examining the $schema keyword.

I’m not sure what you mean by “practical effect.” The important thing here is testing the specification. How the tests are organized and configured is a secondary concern. It doesn’t matter whether it’s possible to figure the processing semantics out from outside of the specification (e.g. the directory structure) or not. If it is both possible to figure out the right processing semantics from within the spec, and a requirement that implementations do so correctly, then that is what we should test.

Perhaps the most important argument is that it’s possible to write a conforming implementation that only uses $schema to determine the processing rules. I don’t know of any that effectively require it (refusing to process without it) but it would be valid to do that. Regardless, a validator should only need to be explicitly configured with a draft if it can’t figure things out from $schema.

Our test suite should properly tests validators that actually rely on $schema without forcing them to pre-configure the draft beforehand. Pre-configuring is a convenience workaround outside of the specification’s requirements, and it’s the specification we should test.

There’s also, in my view, no real reason not to do it. Adding $schema does not interfere. It is at most a mild inconvenience while writing tests, as one usually isn’t writing huge numbers of schema objects at once.

I just added it to all of 2020-12’s test cases, by hand, in vi, no scripting, no IDE, no nothing. It took a few slightly tedious minutes, but I needed to stop staring at another problem so that was convenient. I’m happy to do the others. I understand @Julian’s concerns over mass changes, and am also happy to do whatever verification work is needed to make everyone comfortable with it (I’m writing a little script that goes through and checks for missing or mismatched $schema URIs and will add the output to any PRs).

Finally, there is clearly a larger discussion to be had regarding how we expect implementations to process schemas. In my experience, people use $schema when they are sending schemas out to be consumed by others, because you never want to rely on people reading documentation to get the settings right. Keeping the configuration inside the system is far more reliable.

I believe that we should orient the test suite towards real-world usage rather than convenience for test authors. Particularly when the inconvenience is not that substantial.

handrews on Jun 20, 2022

Even if this was the intention, there’s no such thing as “running against a draft” (any more than I can parse JSON “against RFC 4627”)

I disagree, and it seems there’s context here where some of this discussion has occurred, but let’s focus on the things which are relevant to this ticket, which concerns simply whether we add $schema to all tests, as it seems there’s other discussions that Henry has linked for this part of the discussion.

It does seem that now both you and @karenetheridge are indeed disagreeing with the intended next steps here. Can one or both of you explain how you read the specification then? The section @handrews linked seems clear, so it seems the burden should be on you to explain it some other way.

Specifically, I hypothetically have written an implementation which can accept schemas written either for draft 7 semantics or draft2020 semantics. This hypothetical implementation will return “true”, i.e. valid, for all instances, when presented with a schema which does not have $schema declared in it. Can you point to where in the specification says I may not do so?

Henry has pointed to:

If absent from the document root schema, the resulting behavior is implementation-defined_.

which to me seems to say very clearly that I may indeed decide to do so. And, if I may do so, I will have difficulty running the test suite under my implementation, because around half of the tests in the draft2020 folder will fail when run against my draft2020 validator!

So in short to me, the spec seems clear, and effectively means we MUST add $schema to all schemas, otherwise an implementation such as the above would not be able to use it. Yes, we then may add optional tests which say one can have an implementation which indeed operates identically without $schema, but the question doesn’t seem to be about clarity, the spec literally seems to allow compliant implementations which cannot today run the suite.

What have I missed?

(n.b. we definitely should not collapse the suite into one giant file or directory, so that’s not one of the options anyone’s proposing I believe.)

Julian on Jun 20, 2022

Erroring without a specified metaschema/dialect is entirely reasonable. My implementation switched to it; you have to either have $schema in your schema, or separately specify the metaschema to instantiate the schema as. {"type": "string"} without any other information is an error. @jdesrosiers says the same about their implementation in https://github.com/json-schema-org/community/issues/189 - I am not up to date on that issue (it is on my to-catch-up-on list) but I do not think that calling an error result ‘obviously unreasonable’ has any consensus there.

My implementation passes the test suite, falling back to specifying the metaschema according to the directory name (as I suppose most implementations do here). Using the mechanism of $schema would be better, in my opinion.

notEthan on Jun 24, 2022

may perfectly well have decided that their “implementation-defined behavior” is “blow up and error”.

This effect is obviously unreasonable (I don’t think this was intended), and the topic of https://github.com/json-schema-org/community/issues/189

awwright on Jun 24, 2022

For my own complete clarity, and thanks again for bearing with me until now, I want to also cite something from json-schema-org/json-schema-spec#439 (which I intend to get back to and merge in the next few days), which covers why that section means an immediate +1, which is:

additional tests MUST NOT ever fail for an implementation that is correct under the specification

whereas it’s now clear that a correct implementation, under

If absent from the document root schema, the resulting behavior is implementation-defined_.

may perfectly well have decided that their “implementation-defined behavior” is “blow up and error” (EDIT: see the hidden comments below, there may be disagreement about this precise "implementation-defined behavior, but the point stands for some other behavior).

So yes, anyone is welcome to do this (on all drafts, or ones with that language, I have no strong opinion since there’s going to be “boilerplate” anyhow in most places now). If no one gets to it sometime soon, I’ll get to it myself, probably after json-schema-org/json-schema-spec#439 and a few other issues are closed first.

Julian on Jun 24, 2022

My reading skills (of the spec) apparently leave what to be desired, since I figured you were referring to that paragraph, but I have up until this moment probably never internalized the last half of that paragraph, i.e.:

If absent from the document root schema, the resulting behavior is implementation-defined_.

and was always focused on “yeah OK SHOULD, but doesn’t need to”.

That’s quite clear, so I’m now very much +1 on the change, and indeed as you say now even when we do now have tests without $schema, they need to go in optional even.

Oh, I did not mean to imply that you did! I apologize, I should have worded that better. I’m trying to work on my tendency to sound accusatory on this sort of thing.

Don’t worry about it 😃 I’ll very much assume good intent (at least until I look silly doing so 😄).

Julian on Jun 18, 2022

I’m responding quickly not to say I’ve already understood what you wrote (which yeah thanks! I’ll read it carefully) but just because:

as of 2019-09 […] the absence of $schema in the document root was explicitly stated to result in undefined behavior.

If this is true, you already convinced me. Can you or someone point me at where this is?

The reason for this is that the test suite makes it very clear that $schema doesn’t matter. There are no test cases for it at all.

I’ll just state on the record that I do not intend the test suite to say this, and if we don’t have it, it’s purely an “accident” of lack of effort until now on my part, something I clearly want to fix.

Julian on Jun 18, 2022

@Julian I apologize for pushing on the bulk change topic. Please feel free to edit out any comments, in whole or in part, that you feel detract from the point of this issue. 🙏

handrews on Jun 24, 2022