problem-specifications: canonical-data.json standardisation discussion (was: Malformed data?)
It appears that all-your-base.json is malformed. Where allergies.json has the structure of:
{
"allergic_to": {
"description": [ ... ],
"cases": [ { "description": "...", ... } ... ]
}, ...
}
all-your-base.json has:
{
"#": [ ... ],
"cases": [ ... ]
cases should be wrapped in a function name, yes?
It appears that bin/jsonlint only checks that the json parses, not that it has good structure.
At the very least, I think this should be patched up and the README expanded to actually show the desired structure. Happy to do a PR for that, assuming I understand it already. š
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 70 (64 by maintainers)
Iām sorry you found my comments disheartening. I just think that your notion: āto generate all the tests for all the exercises at once which should be trivially possibleā ignores the fact that youāre mechanically generating tests for consumption across a bunch of languages with widely different styles and semantics.
That is going to wind up with āleast common denominatorā tests. All I was suggesting is that mechanically generated tests will be a good rough draft, but that they should be worked on by humans so that they are good pedagogical examples for each language. To skip out on that is to kinda miss the point of exercism in the first place.
For example, I have found a world of difference in the quality of tests and their ability to help teach me the language and assist me in understanding in rustās tests. Some of them are night and day in difference, and the worst ones were the ones that did a bare minimum āleast common denominatorā approach.
@catb0t:
I donāt. I think you can get a good start on it for most languages, but that idea doesnāt take into consideration language call semantic differences (factor/forth vs assembly vs algol-based languages vs keyword arguments (smalltalk, ruby) as an example). Nor is it realistic about the level of finality. I think you can easily generate a rough draft for every exercise for a language, but it still needs to be reviewed, finalized, and styled by a human to be a good example to learn from.
I think combining could make sense, though in that case Iād almost prefer swapping
casesback totestssince the scope of the file is now larger than ājustā tests. Iād obviously defer to @kytrinyx on the combo though, since it will ripple out to other areas of the project.I personally like the second one better, as there is only ever one test type, right? Then why have any nesting? Secondly, I also like having the description, type, input and expected values on the same level, as I think a case could be made for them to all be top-level properties (they are equally important).
Thanks for the feedback, @stkent.
I agree, but most of the complexity comes from what we already have in
x-common, and making the specification simpler would remove some features and make some test suites significantly less documented.Iām glad you said that. Here is a simpler example of a schema-complaint test suite:
I tried to design the schema to make simple test suites easy to write, while making complex test suites still possible. Of course, there is a significant sacrifice in readability to make the JSON reasonably āparseableā and the schema minimally rational.
Iām afraid this is as simple it gets without loosing the flexibility needed to capture our current test. š
Ok, it seems to me like weāve all sort of agreed (in our own ways) that this is a rather difficult problem to solve - so how about we try to make this into a couple smaller problems and tackle them individually? š
From what I see, we have two distinct goals weāre trying to achieve here:
Consistency in format allows for easier human readability of the files, which means an easier time understanding and maintaining them.
Itās possible that if things are consistent enough and we come up with a good enough abstraction, we could programmatically generate the beginnings of test files for some types of language tracks.
Both are indeed noble goals with clear value, and I totally think we should strive to achieve them both - just maybe not at the same time?
Since goal number 2 is clearly really hard, how about we try and get something thatās at least solving goal number 1, and then once thatās done we can try and refine it further to accomplish goal number 2? I think limiting the scope of what weāre trying to accomplish (with an eye towards the future of course) will be realy helpful in actually getting something shipped here.
@rbasso I see your points, and I actually think we can get a little more of the benefit that you mention. How about something like this:
For the interest of programmatically generating tests, we know what our inputs are (and we can easily ignore the human-specific context in the keys in that object and just look at the values), but for the purpose of assigning some meaning to this data, we can give some context-specific information by adding those keys to the
inputobject.I think with the above structure we still donāt need to understand the context to figure out whatās going on, but if we want context itās there for us. I actually think this is a much better version than the original one!
I guess if I were to generalize the structure of a
testobject in that JSON, it would be this:So, I actually kind of like that. What does everyone else think?
@zenspider sounds like youāre writing a parser, perhaps you can look through the existing data and tell us what structure we should be using to make parsing convenient. Then we can document it and create issues to update the old data.
I completely agree with what @rbasso is saying. Yes, this is āonlyā a partial solution to a much harder problem, but I feel the partial solution by itself has more than enough merit to warrant going through with it.
With the current suggested format, test generator are definitely possible, although not without some exercise-specific code. I think this is fine, but as said, we can always look to improve this format later to better suit test generation even more.
So all in all, I really like what we have now and I think we should go with that, at least for now.
JSON Schema for ācanonical-data.jsonā files
Changes:
casesis mandatory again.typerenamed toproperty.metadata.ymlanymore.expectedobject and it has anerrorproperty, it must be the only property and also be of typestringThe schema may need some minor adjustments - cause I probably made a few mistakes - but I donāt see how to improve it more without loosing flexiblity and/or readability.
I think we finally got something good enough here, people! š
Edit: I also wrote a test suite as a proof-of-concept in Haskell. It is not beautiful code, but it showcases that we in fact donāt need much exercise-specific code to parse and create the tests.
Well, although I do have one that might fit.
Letās consider
change: Given somecoinsand atarget, we want to find the way to make thattargetwith the fewest coins.We could certainly have cases where the input is perfectly valid, but there is no way to reach the
target. So letās say that thatās represented asnullin JSON.And then we could have cases where the input is obviously invalid, such as a negative target*. So maybe you would say we should call this an error case, with some appropriate representation in JSON (I donāt care what, but I used #401 because why not).
Is this an example of what we had in mind? Given just these three cases, is it understandable how to have the tests in a target language:
Or, have I missed the mark completely with when to use
nullversus an error? In which case please correct me.*= Letās leave aside a ternary coin system where you might have ānegative coinsā, representing the party for whom change is being made giving the coins to the party making the change, rather than the other way around⦠because in this situation (as well as any others with negative coins), you can reach negative targets.@rbasso It is! I think we have something like three options for the expected result of a test:
Obviously, item 1 is trivial: just put the expected value in the JSON data. For 2, we could agree upon a standard value. I think
nullwould be most suitable. As for three, we could do what #551 suggests and return a special error object.For your
typereplacement suggestions, I really like the suggestedpropertyname, as I think it is most clear. š@rbasso The updated schema looks great! I was wondering if now is the time to also specify how to handle errors in the schema.
I do appreciate the examples when those are provided alongside the specifications though - that makes understanding the spec a lot easier!
Whatever the outcome, Iād like to note that the current proposal appears at first glance a lot more complex than any individual canonical data set Iāve used to build an exercise. Itās pretty intimidating as I think about tackling some of those new āadd canonical data for this exerciseā issues.
Finally, after fighting the JSON Schema language for a while, I think I got a proposal that can serve as a starting schema for discussion. I expect it to be:
Here is a sample test file:
And here is the JSON Schema, formatted in a very unusual way for easier understanding (at least for me):
I know this is far from perfect, and some people where expecting a more rigid test schema to allow a fully automated test suite generation. But I believe this is better than nothing.
Also, it is ready to use and seems to work as expected in my preliminary tests:
foobartest runDoes anyone have anything to say about it?
Edit: There is also a ported
bowling/canonical-data.jsonhere as an example.And here is my first JSON Schema. If anyone has any experience with it, I would love suggestions on how to improve it.
I hope it is better now!
It seems that this issue is dead for a whileā¦
Letās try to push a little further the idea proposed by @devonestes!
I have been playing with the JSON files this week and I have some ideas on how we can extract most of the current test structure without sacrificing readability or enforcing too much.
Intro
This will be a really long post, so grab your coffee mug and try not to sleep because I need some feedback here! š
Test suite structure
Grouping
Some test suites have tests grouped with labels:
from
acronymGrouping tests adds readability to both, the JSON file and the generated tests, so I believe that we should keep this feature somehow.
Heterogeneous groups
In the example above, the custom name
abbreviatewas used to group and also identify the type of the tests to be performed. This is an easy solution but is also a little too restrictive. It would be useful to group distinct types of tests:Heterogeneous group example
We could also have encoded the test types in other ways but what matters here is that moving the test-type specification near the test data, we gained the ability to create heterogeneous test groups!
Nested grouping
Decoupling the grouping logic from the test types, we could even nest test groups with varying depths:
That may seem unneeded and a little too complex, but it comes almost for free! Also, it is good to have some flexibility for the more complex test suites we may want to create.
A generator could simply ignore all the test grouping and just recursively scan for the tests - flattening the structure - or it could use the grouping information to construct a completely labeled test tree, if the test framework allows it.
Test case specification
The challenge here is to enforce a minimal structure on all tests, without losing any readability or flexibility.
Previous discussions indicate that there is no consensus about encoding input and output, so we should avoid discussing that now and focus on things that will not start a language war.
To allow easy, semi-automatic generation of tests, I think it would be convenient to have at least the following information about a test:
description- With it the test generators have a textual description to display in case of success/failure. Also, it allows users and maintainers to refer to a specific test case in a language-independent way. Tests without descriptions would leave the users in a situation where they cannot easily identify where they failed, so it makes sense to enforce their presence.type- At least implicitly, any test case has a type that identifies a property being tested, most of the times the name of a test function. What matters here is that we need a unique identifier for each kind of test in a test suite, so what we donāt end up in a situation where it is impossible to automatically identify the type of each test case.Ambiguous test type example
I see
twothree options to signal the test type:Using a unique key for each test type
This is readable and easy enough to parse, but it doesnāt exposes the fact that all the test cases have a description.
Using a unique key inside a
testkeyThis captures more structure but is not so nice to the eyes.
Edit: Key-value pair option
Adding a key-value pair to identify the test
This is a little less readable than the first option, but may be interesting for parsing.
The first option is more pleasant to the eyes and is similar to what we already use, so it makes sense to stick with it unless we find a reason to avoid it.It would be nice to have some arguments in favor or against each of these three alternatives.
JSON Schema
Iām still trying to write a schema to allow automatic validation of the
canonical-data.jsonfiles, but I decided that it was already time to discuss the idea publicly, so that we could improve it together.Edit: Remember about
exercise,versionandcomments.Proof of concept
Following these ideas, I rewrote
exercises/bob/canonical-data.jsonto test the concept in a simple case:To check how hard it could be to parse the file, I rewrote the test suite to run the tests directly from the JSON file.
This is still experimental code, so donāt take it seriously, but note that only 12 lines of code are exercise-specific. All the others lines are exercise independent!
I avoided any trick to make this easier in Haskell, so the parsing is verbose and feels a little clumsy. Changing the JSON file would make parsing way easier, but that would favor the Haskell track in detriment of other languages and human-readability.
Final comments
Well, this is all I got for nowā¦
I think that, if we decide to follow this path, in the short term we can expect to:
canonical-data.jsonfiles in Travis-CI.I deliberately avoided specifying inputs and outputs from the tests for a few reasons:
Anyone think it is an useful endeavor to standardize just that for now?
The reason I stopped commenting despite the fact that Iām the one who re-kindled this thread is that these replies really disheartened me:
Then what is the goal of this discussion about JSON format at all, if youāre not interested in programmatically processing the JSON data to generate the unit tests?
Moreover, I donāt see why language-specific differences matter here ā my point was that totally disregarding ALGOL syntax and Ruby keyword arguments and Haskell data types, if everything is just a string you can write a generator to write out tests files (and example files too), and since there are already exercise-specific test generators, why not save yourselves the work and write a generic one with better-designed data? (Yes, you should still read and comment the output of the generator for good measure.)
Iāve been thinking about this a bit recently, and I think the most generalized version of this we can get might be the best for as many different needs as possible. What weāre really doing in most of these exercises is basically testing functions. Thereās input, and thereās output. By trying to use keys in our JSON objects that are things like āplaintextā and ākeyā, thatās creating a need for knowledge about the exercise to accurately understand how those parts interact.
I think if we can generalize on that concept of a function that weāre testing, that might be helpful both for human readability, and also for machine readability so we can possibly use this data for automatic tests.
So, hereās my example:
I donāt think there are any exercises that require anything other than input and output, but I havenāt done too deep of an analysis on that. Iād love any feedback if there are edge cases that would need to be taken care of here. I know that based on the structure above I can think of reasonable ways to parse that and automatically create some skeletons for tests in Ruby, Elixir, Go, JavaScript and Python, but thatās really all I can reasonably speak to since those are the only languages I have a decent amount of experience with.
Also, I sort of like the stripped down way of looking at this - when I look at that data I donāt need to know the context of the exercise to know whatās going on. I just know thereās a thing called
encode, and that takes some input and returns some output, and thereās a text description of whatās going on.Iām not really 100% sure that this would give us everything we want, but I wanted to at least throw this idea out there to get feedback and see if it might be a starting point for an actually good idea!
I do not see any sense in specifying order of arguments in the canonical testdata. There are different idioms and necessities in the various tracks.
Letās assume we have some data type and we write functions around it. Letās call it list. In object oriented languages it will be the object we call a method in so it will be completely out of the order of arguments. In elixir we like to have this object like argument at the first position to be able to pipe it around, while in Haskell it is preferred to have it last to be able to use point free style and partial application.
So as you can see order of arguments has to be specifies by the tracks maintainer a anyway.
Ryan Davis notifications@github.com schrieb am Mi., 21. Sep. 2016 23:47:
Iād fully support a more generic structure which would make it unnecessary to have a generator for each exercise.
But I have to admit, I have no idea how it could look like. Since you already said you would change them, do you have an idea about the structure already @catb0t?
Also since it seems to be the right time, I want to request a feature for this generic format:
I had a sleepness night, of how I should handle changes in the canonical data as I wanted to have some versioning test. First I thought Iād could just use the date of the last change, but this would mean, that because of whitespace changes all earlier submissions would get āinvalidatedā. Therefore I think it would be a good idea to version the canonical data as well.