problem-specifications: canonical-data.json standardisation discussion (was: Malformed data?)

It appears that all-your-base.json is malformed. Where allergies.json has the structure of:

{
    "allergic_to": {
        "description": [ ... ],
        "cases": [ { "description": "...", ... } ... ]
            }, ...
}

all-your-base.json has:

{
  "#": [ ... ],
  "cases": [ ... ]

cases should be wrapped in a function name, yes?

It appears that bin/jsonlint only checks that the json parses, not that it has good structure.

At the very least, I think this should be patched up and the README expanded to actually show the desired structure. Happy to do a PR for that, assuming I understand it already. šŸ˜€

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 70 (64 by maintainers)

Commits related to this issue

Most upvoted comments

I’m sorry you found my comments disheartening. I just think that your notion: ā€œto generate all the tests for all the exercises at once which should be trivially possibleā€ ignores the fact that you’re mechanically generating tests for consumption across a bunch of languages with widely different styles and semantics.

That is going to wind up with ā€œleast common denominatorā€ tests. All I was suggesting is that mechanically generated tests will be a good rough draft, but that they should be worked on by humans so that they are good pedagogical examples for each language. To skip out on that is to kinda miss the point of exercism in the first place.

For example, I have found a world of difference in the quality of tests and their ability to help teach me the language and assist me in understanding in rust’s tests. Some of them are night and day in difference, and the worst ones were the ones that did a bare minimum ā€œleast common denominatorā€ approach.

@catb0t:

My goal with exercism.autogen-exercises is to generate all the tests for all the exercises at once which should be trivially possible [emphasis mine]. I don’t want a different ${exercisename}-testgen.factor for each different JSON structure.

I don’t. I think you can get a good start on it for most languages, but that idea doesn’t take into consideration language call semantic differences (factor/forth vs assembly vs algol-based languages vs keyword arguments (smalltalk, ruby) as an example). Nor is it realistic about the level of finality. I think you can easily generate a rough draft for every exercise for a language, but it still needs to be reviewed, finalized, and styled by a human to be a good example to learn from.

I think combining could make sense, though in that case I’d almost prefer swapping cases back to tests since the scope of the file is now larger than ā€œjustā€ tests. I’d obviously defer to @kytrinyx on the combo though, since it will ripple out to other areas of the project.

I personally like the second one better, as there is only ever one test type, right? Then why have any nesting? Secondly, I also like having the description, type, input and expected values on the same level, as I think a case could be made for them to all be top-level properties (they are equally important).

Thanks for the feedback, @stkent.

… I’d like to note that the current proposal appears at first glance a lot more complex than any individual canonical data set I’ve used to build an exercise.

I agree, but most of the complexity comes from what we already have in x-common, and making the specification simpler would remove some features and make some test suites significantly less documented.

I do appreciate the examples when those are provided alongside the specifications though - that makes understanding the spec a lot easier!

I’m glad you said that. Here is a simpler example of a schema-complaint test suite:

{
  "exercise":"foobar",
  "version":"0.0.0",
  "tests":[
    {
      "description":"How is the codebase?",
      "bar":{
        "input"   : "fu",
        "expected": "fubar"
      }
    },
    {
      "description": "A martial art",
      "foo":{
        "input"   : "Kung-",
        "expected": "Kung-foo"
      }
    },
    {
      "description": "Where do you live?",
      "bar":{
        "input"   : "",
        "expected": "bar"
      }
    },
    {
      "description": "Undescriptive variable name",
      "foo":{
        "input"   : "",
        "expected": "foo"
      }
    }
  ]
}

I tried to design the schema to make simple test suites easy to write, while making complex test suites still possible. Of course, there is a significant sacrifice in readability to make the JSON reasonably ā€œparseableā€ and the schema minimally rational.

I’m afraid this is as simple it gets without loosing the flexibility needed to capture our current test. šŸ˜”

Ok, it seems to me like we’ve all sort of agreed (in our own ways) that this is a rather difficult problem to solve - so how about we try to make this into a couple smaller problems and tackle them individually? šŸ˜‰

From what I see, we have two distinct goals we’re trying to achieve here:

  1. Consistency in format allows for easier human readability of the files, which means an easier time understanding and maintaining them.

  2. It’s possible that if things are consistent enough and we come up with a good enough abstraction, we could programmatically generate the beginnings of test files for some types of language tracks.

Both are indeed noble goals with clear value, and I totally think we should strive to achieve them both - just maybe not at the same time?

Since goal number 2 is clearly really hard, how about we try and get something that’s at least solving goal number 1, and then once that’s done we can try and refine it further to accomplish goal number 2? I think limiting the scope of what we’re trying to accomplish (with an eye towards the future of course) will be realy helpful in actually getting something shipped here.

Excelsior

@rbasso I see your points, and I actually think we can get a little more of the benefit that you mention. How about something like this:

{
  "exercise": "cipher",
  "version": "0.1.0 or an object with more detailed information",
  "comments": [
    "Anything you can think of",
    "as a list of strings"
  ],
  "tests": [
    {
      "description": "encodes simple text",
      "function": "encode",
      "input": {
        "plaintext": "Secret message",
        "key": "asdf1234"
      },
      "output": "qwertygh"
    }
  ]
}

For the interest of programmatically generating tests, we know what our inputs are (and we can easily ignore the human-specific context in the keys in that object and just look at the values), but for the purpose of assigning some meaning to this data, we can give some context-specific information by adding those keys to the input object.

I think with the above structure we still don’t need to understand the context to figure out what’s going on, but if we want context it’s there for us. I actually think this is a much better version than the original one!

I guess if I were to generalize the structure of a test object in that JSON, it would be this:

{
  "description": "description of what is being tested in this test",
  "function": "name of function (or method) being tested",
  "input": {
    "description of input": "actual input (can be string, int, bool, hash/map, array/list, whatevs)"
  },
  "output": "output of function being tested with above inputs"
}

So, I actually kind of like that. What does everyone else think?

@zenspider sounds like you’re writing a parser, perhaps you can look through the existing data and tell us what structure we should be using to make parsing convenient. Then we can document it and create issues to update the old data.

I completely agree with what @rbasso is saying. Yes, this is ā€œonlyā€ a partial solution to a much harder problem, but I feel the partial solution by itself has more than enough merit to warrant going through with it.

With the current suggested format, test generator are definitely possible, although not without some exercise-specific code. I think this is fine, but as said, we can always look to improve this format later to better suit test generation even more.

So all in all, I really like what we have now and I think we should go with that, at least for now.

JSON Schema for ā€˜canonical-data.json’ files

Changes:

  • cases is mandatory again.
  • type renamed to property.
  • No properties taken from metadata.yml anymore.
  • Add restriction: If there is an expected object and it has an error property, it must be the only property and also be of type string

The schema may need some minor adjustments - cause I probably made a few mistakes - but I don’t see how to improve it more without loosing flexiblity and/or readability.

I think we finally got something good enough here, people! šŸ‘

Edit: I also wrote a test suite as a proof-of-concept in Haskell. It is not beautiful code, but it showcases that we in fact don’t need much exercise-specific code to parse and create the tests.

If the answer is ā€œyesā€, you don’t necessarily have to give an example of a property,

Well, although I do have one that might fit.

Let’s consider change: Given some coins and a target, we want to find the way to make that target with the fewest coins.

We could certainly have cases where the input is perfectly valid, but there is no way to reach the target. So let’s say that that’s represented as null in JSON.

And then we could have cases where the input is obviously invalid, such as a negative target*. So maybe you would say we should call this an error case, with some appropriate representation in JSON (I don’t care what, but I used #401 because why not).

Is this an example of what we had in mind? Given just these three cases, is it understandable how to have the tests in a target language:

{ "exercise" : "change"
, "version"  : "0.0.0"
, "comments":
    [ "showing all three possible types of values for `expected`"
    ]
, "cases":
    [ { "description": "Make change"
      , "comments":
          [ "All in one group in this example, but can be split apart later"
          ]
      , "cases":
          [ { "description": "Can make change"
            , "type"       : "change"
            , "coins"      : [1]
            , "target"     : 3
            , "expected"   : [1, 1, 1]
            }
          , { "description": "Can't make change"
            , "type"       : "change"
            , "coins"      : [2]
            , "target"     : 3
            , "expected"   : null
            }
          , { "description": "Negative targets are invalid"
            , "type"       : "change"
            , "coins"      : [1]
            , "target"     : -1
            , "expected"   : {"error": "negative target is invalid"}
            }
          ]
      }
    ]
}

Or, have I missed the mark completely with when to use null versus an error? In which case please correct me.

* = Let’s leave aside a ternary coin system where you might have ā€œnegative coinsā€, representing the party for whom change is being made giving the coins to the party making the change, rather than the other way around… because in this situation (as well as any others with negative coins), you can reach negative targets.

@rbasso It is! I think we have something like three options for the expected result of a test:

  1. A concrete value.
  2. A missing value (null/optional).
  3. An error.

Obviously, item 1 is trivial: just put the expected value in the JSON data. For 2, we could agree upon a standard value. I think null would be most suitable. As for three, we could do what #551 suggests and return a special error object.

For your type replacement suggestions, I really like the suggested property name, as I think it is most clear. šŸ‘

@rbasso The updated schema looks great! I was wondering if now is the time to also specify how to handle errors in the schema.

I do appreciate the examples when those are provided alongside the specifications though - that makes understanding the spec a lot easier!

Whatever the outcome, I’d like to note that the current proposal appears at first glance a lot more complex than any individual canonical data set I’ve used to build an exercise. It’s pretty intimidating as I think about tackling some of those new ā€œadd canonical data for this exerciseā€ issues.

Finally, after fighting the JSON Schema language for a while, I think I got a proposal that can serve as a starting schema for discussion. I expect it to be:

  • Very human-readable.
  • Easy enough to parse.
  • Flexible enough to capture any reasonable test structure.
  • Similar to what we already have, so migration should be easy.

Here is a sample test file:

{
  "exercise":"foobar",
  "version":"0.1.0",
  "comments":[
    "We are",
    "comments!"
  ],
  "group":[
    {
      "foo":{
        "description":"foo the void",
        "input":"",
        "expected":"foo"
      }
    },
    {
      "bar":{
        "description":"bar the void",
        "input":"",
        "expected":"bar"
      }
    },
    {
      "description":"snafu",
      "group":[
        {
          "foobar":{
            "description":"foo and bar",
            "input":"...wait for it...",
            "expected":"foo...wait for it...bar"
          }
        }
      ]
    }
  ]
}

And here is the JSON Schema, formatted in a very unusual way for easier understanding (at least for me):

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "$ref"   : "#/definitions/canonicalData",

   "definitions":{

      "canonicalData":
          { "type"      : "object"
          , "required"  : ["exercise" , "version" , "group"]
          , "properties":
                { "exercise": { "$ref": "#/definitions/exercise" }
                , "version" : { "$ref": "#/definitions/version"  }
                , "comments": { "$ref": "#/definitions/comments" }
                , "group"   : { "$ref": "#/definitions/group"    }
                }
          , "additionalProperties": false
          },

      "exercise": { "type": "string" },

      "version" : { "type": "string" },

      "comments":
          { "type"    : "array"
          , "items"   : { "type": "string" }
          , "minItems": 1
          },

      "group":
          { "type"    : "array"
          , "items"   : { "$ref": "#/definitions/testItem" }
          , "minItems": 1
          },

      "testItem":
          { "oneOf":
                [ { "$ref": "#/definitions/singleTest"   }
                , { "$ref": "#/definitions/labeledGroup" }
                ]
          },

      "singleTest":
          { "type"                 : "object"
          , "minProperties"        : 1
          , "maxProperties"        : 1
          , "additionalProperties" : { "$ref": "#/definitions/testData" }
          },

      "testData":
          { "type"      : "object"
          , "required"  : ["description"]
          , "properties":
                { "description": { "$ref": "#/definitions/description" }
                }
          },

      "description": { "type":"string" },

      "labeledGroup":
          { "type"      : "object"
          , "required"  : ["description", "group"]
          , "properties":
                { "description": { "$ref": "#/definitions/description" }
                , "group"      : { "$ref": "#/definitions/group"       }
                }
          , "additionalProperties": false
          }
   }
}

I know this is far from perfect, and some people where expecting a more rigid test schema to allow a fully automated test suite generation. But I believe this is better than nothing.

Also, it is ready to use and seems to work as expected in my preliminary tests:

foobar test run
foobar-0.1.0
  foo the void
  bar the void
  snafu
    foo and bar

Finished in 0.0001 seconds
3 examples, 0 failures

Does anyone have anything to say about it?

Edit: There is also a ported bowling/canonical-data.json here as an example.

And here is my first JSON Schema. If anyone has any experience with it, I would love suggestions on how to improve it.

{
   "$schema":"http://json-schema.org/draft-04/schema#",
   "$ref":"#/definitions/top",
   "definitions":{
      "comments":{
         "type":"array",
         "items":{
            "type":"string"
         },
         "minItems":1
      },
      "description":{
         "type":"string"
      },
      "exercise":{
         "type":"string"
      },
      "group":{
         "type":"array",
         "items":{
            "$ref":"#/definitions/testOrLabeledGroup"
         },
         "minItems":1
      },
      "labeledGroup":{
         "type":"object",
         "required":[
            "description",
            "group"
         ],
         "properties":{
            "description":{
               "$ref":"#/definitions/description"
            },
            "group":{
               "$ref":"#/definitions/group"
            }
         },
         "additionalProperties":false
      },
      "test":{
         "type":"object",
         "required":[
            "test",
            "description"
         ],
         "properties":{
            "test":{
               "$ref":"#/definitions/testType"
            },
            "description":{
               "$ref":"#/definitions/description"
            }
         }
      },
      "testOrLabeledGroup":{
         "oneOf":[
            {
               "$ref":"#/definitions/test"
            },
            {
               "$ref":"#/definitions/labeledGroup"
            }
         ]
      },
      "testType":{
         "type":"string"
      },
      "top":{
         "type":"object",
         "required":[
            "exercise",
            "version",
            "group"
         ],
         "additionalProperties":false,
         "properties":{
            "exercise":{
               "$ref":"#/definitions/exercise"
            },
            "version":{
               "$ref":"#/definitions/version"
            },
            "comments":{
               "$ref":"#/definitions/comments"
            },
            "group":{
               "$ref":"#/definitions/group"
            }
         }
      },
      "version":{
         "type":"string"
      }
   }
}

I hope it is better now!

{
   "exercise":"bob",
   "version":"1.0.0",
   "comments":[
      "I am a comment"
   ],
   "group":[
      {
         "description":"foo",
         "group":[
            {
               "test":"response",
               "description":"stating something",
               "input":"Tom-ay-to, tom-aaaah-to.",
               "expected":"Whatever."
            },
            {
               "test":"response",
               "description":"stating the same thing again",
               "input":"Tom-ay-to, tom-aaaah-to.",
               "expected":"Whatever."
            }
         ]
      },
      {
         "description":"bar",
         "group":[
            {
               "test":"response",
               "description":"shouting",
               "input":"WATCH OUT!",
               "expected":"Whoa, chill out!"
            }
         ]
      }
   ]
}

It seems that this issue is dead for a while…

Let’s try to push a little further the idea proposed by @devonestes!

Since goal number 2 is clearly really hard, how about we try and get something that’s at least solving goal number 1, and then once that’s done we can try and refine it further to accomplish goal number 2? I think limiting the scope of what we’re trying to accomplish (with an eye towards the future of course) will be realy helpful in actually getting something shipped here.

I have been playing with the JSON files this week and I have some ideas on how we can extract most of the current test structure without sacrificing readability or enforcing too much.

Intro

This will be a really long post, so grab your coffee mug and try not to sleep because I need some feedback here! šŸ˜„

Test suite structure

Grouping

Some test suites have tests grouped with labels:

from acronym
{
   "abbreviate":{
      "description":"Abbreviate a phrase",
      "cases":[
         {
            "description":"basic",
            "phrase":"Portable Network Graphics",
            "expected":"PNG"
         }
      ]
   }
}

Grouping tests adds readability to both, the JSON file and the generated tests, so I believe that we should keep this feature somehow.

Heterogeneous groups

In the example above, the custom name abbreviate was used to group and also identify the type of the tests to be performed. This is an easy solution but is also a little too restrictive. It would be useful to group distinct types of tests:

Heterogeneous group example
{
   "group":{
      "description":"Qwerty",
      "cases":[
         {
            "encode":{
               "description":"Qwerty encoding",
               "plaintext":"Sample plaintext",
               "ciphertext":"adsdfsjqwreiugi"
            }
         },
         {
            "decode":{
               "description":"Qwerty decoding",
               "ciphertext":"adsdfsjqwreiugi",
               "plaintext":"sampleplaintext"
            }
         }
      ]
   }
}

We could also have encoded the test types in other ways but what matters here is that moving the test-type specification near the test data, we gained the ability to create heterogeneous test groups!

Nested grouping

Decoupling the grouping logic from the test types, we could even nest test groups with varying depths:

{
   "group":{
      "description":"mathematics",
      "tests":[
         {
            "group":{
               "description":"basic math",
               "tests":[
                  {
                     "addition":{
                        "description":"simple addition",
                        "left":1,
                        "right":2,
                        "expected":3
                     }
                  },
                  {
                     "subtraction":{
                        "description":"simple subtraction",
                        "left":3,
                        "right":2,
                        "expected":1
                     }
                  }
               ]
            }
         },
         {
            "division":{
               "description":"awesome division by zero",
               "left":1,
               "right":0,
               "expected":"Only Chuck Norris can divide by zero!"
            }
         }
      ]
   }
}

That may seem unneeded and a little too complex, but it comes almost for free! Also, it is good to have some flexibility for the more complex test suites we may want to create.

A generator could simply ignore all the test grouping and just recursively scan for the tests - flattening the structure - or it could use the grouping information to construct a completely labeled test tree, if the test framework allows it.

Test case specification

The challenge here is to enforce a minimal structure on all tests, without losing any readability or flexibility.

Previous discussions indicate that there is no consensus about encoding input and output, so we should avoid discussing that now and focus on things that will not start a language war.

To allow easy, semi-automatic generation of tests, I think it would be convenient to have at least the following information about a test:

  • description - With it the test generators have a textual description to display in case of success/failure. Also, it allows users and maintainers to refer to a specific test case in a language-independent way. Tests without descriptions would leave the users in a situation where they cannot easily identify where they failed, so it makes sense to enforce their presence.
  • type - At least implicitly, any test case has a type that identifies a property being tested, most of the times the name of a test function. What matters here is that we need a unique identifier for each kind of test in a test suite, so what we don’t end up in a situation where it is impossible to automatically identify the type of each test case.
Ambiguous test type example
{
   "group":{
      "description":"Qwerty",
      "cases":[
         {
            "test":{
               "description":"Qwerty encoding",
               "plaintext":"Sample plaintext",
               "ciphertext":"adsdfsjqwreiugi"
            }
         },
         {
            "test":{
               "description":"Qwerty decoding",
               "ciphertext":"adsdfsjqwreiugi",
               "plaintext":"sampleplaintext"
            }
         }
      ]
   }
}

I see two three options to signal the test type:

Using a unique key for each test type

This is readable and easy enough to parse, but it doesn’t exposes the fact that all the test cases have a description.

{
   "decode":{
      "description":"Qwerty decoding",
      "ciphertext":"adsdfsjqwreiugi",
      "plaintext":"sampleplaintext"
   }
}
Using a unique key inside a test key

This captures more structure but is not so nice to the eyes.

{
   "test":{
      "description":"Qwerty decoding",
      "decode": {
         "ciphertext":"adsdfsjqwreiugi",
         "plaintext":"sampleplaintext"
      }
   }
}

Edit: Key-value pair option

Adding a key-value pair to identify the test

This is a little less readable than the first option, but may be interesting for parsing.

{
   "test":{
      "type":"decode",
      "description":"Qwerty decoding",
      "ciphertext":"adsdfsjqwreiugi",
      "plaintext":"sampleplaintext"
   }
}

The first option is more pleasant to the eyes and is similar to what we already use, so it makes sense to stick with it unless we find a reason to avoid it.

It would be nice to have some arguments in favor or against each of these three alternatives.

JSON Schema

I’m still trying to write a schema to allow automatic validation of the canonical-data.json files, but I decided that it was already time to discuss the idea publicly, so that we could improve it together.

Edit: Remember about exercise, version and comments.

Proof of concept

Following these ideas, I rewrote exercises/bob/canonical-data.json to test the concept in a simple case:

{
   "group":{
      "description":"bob",
      "tests":[
         {
            "response":{
               "description":"stating something",
               "input":"Tom-ay-to, tom-aaaah-to.",
               "expected":"Whatever."
            }
         },
         {
            "response":{
               "description":"shouting",
               "input":"WATCH OUT!",
               "expected":"Whoa, chill out!"
            }
         },
         {
            "response":{
               "description":"shouting gibberish",
               "input":"FCECDFCAAB",
               "expected":"Whoa, chill out!"
            }
         },
         {
            "response":{
               "description":"asking a question",
               "input":"Does this cryogenic chamber make me look fat?",
               "expected":"Sure."
            }
         },
         {
            "response":{
               "description":"asking a numeric question",
               "input":"You are, what, like 15?",
               "expected":"Sure."
            }
         },
         {
            "response":{
               "description":"asking gibberish",
               "input":"fffbbcbeab?",
               "expected":"Sure."
            }
         },
         {
            "response":{
               "description":"talking forcefully",
               "input":"Let's go make out behind the gym!",
               "expected":"Whatever."
            }
         },
         {
            "response":{
               "description":"using acronyms in regular speech",
               "input":"It's OK if you don't want to go to the DMV.",
               "expected":"Whatever."
            }
         },
         {
            "response":{
               "description":"forceful question",
               "input":"WHAT THE HELL WERE YOU THINKING?",
               "expected":"Whoa, chill out!"
            }
         },
         {
            "response":{
               "description":"shouting numbers",
               "input":"1, 2, 3 GO!",
               "expected":"Whoa, chill out!"
            }
         },
         {
            "response":{
               "description":"only numbers",
               "input":"1, 2, 3",
               "expected":"Whatever."
            }
         },
         {
            "response":{
               "description":"question with only numbers",
               "input":"4?",
               "expected":"Sure."
            }
         },
         {
            "response":{
               "description":"shouting with special characters",
               "input":"ZOMG THE %^*@#$(*^ ZOMBIES ARE COMING!!11!!1!",
               "expected":"Whoa, chill out!"
            }
         },
         {
            "response":{
               "description":"shouting with no exclamation mark",
               "input":"I HATE YOU",
               "expected":"Whoa, chill out!"
            }
         },
         {
            "response":{
               "description":"statement containing question mark",
               "input":"Ending with ? means a question.",
               "expected":"Whatever."
            }
         },
         {
            "response":{
               "description":"non-letters with question",
               "input":":) ?",
               "expected":"Sure."
            }
         },
         {
            "response":{
               "description":"prattling on",
               "input":"Wait! Hang on. Are you going to be OK?",
               "expected":"Sure."
            }
         },
         {
            "response":{
               "description":"silence",
               "input":"",
               "expected":"Fine. Be that way!"
            }
         },
         {
            "response":{
               "description":"prolonged silence",
               "input":"          ",
               "expected":"Fine. Be that way!"
            }
         },
         {
            "response":{
               "description":"alternate silence",
               "input":"\t\t\t\t\t\t\t\t\t\t",
               "expected":"Fine. Be that way!"
            }
         },
         {
            "response":{
               "description":"multiple line question",
               "input":"\nDoes this cryogenic chamber make me look fat?\nno",
               "expected":"Whatever."
            }
         },
         {
            "response":{
               "description":"starting with whitespace",
               "input":"         hmmmmmmm...",
               "expected":"Whatever."
            }
         },
         {
            "response":{
               "description":"ending with whitespace",
               "input":"Okay if like my  spacebar  quite a bit?   ",
               "expected":"Sure."
            }
         },
         {
            "response":{
               "description":"other whitespace",
               "input":"\n\r \t",
               "expected":"Fine. Be that way!"
            }
         },
         {
            "response":{
               "description":"non-question ending with whitespace",
               "input":"This is a statement ending with whitespace      ",
               "expected":"Whatever."
            }
         }
      ]
   }
}

To check how hard it could be to parse the file, I rewrote the test suite to run the tests directly from the JSON file.

{-# LANGUAGE OverloadedStrings #-}

-- Basic imports
import Control.Applicative ((<|>), liftA2)
import Control.Monad       ((>=>))

-- To construct the tests.
import Test.Hspec          (Spec, describe, it)
import Test.Hspec.Runner   (configFastFail, defaultConfig, hspecWith)
import Test.HUnit          (assertEqual)

-- To parse the JSON file.
import Data.Aeson          ((.:), eitherDecodeStrict', withArray, withObject)
import Data.Aeson.Types    (Parser, Value, parseEither)
import GHC.Exts            (toList)

-- To read the JSON file.
import Data.ByteString     (readFile)
import Prelude     hiding  (readFile)

-- The module to be tested.
import Bob (responseFor)

-- Read, decode and run the tests.
main :: IO ()
main  = readJSON >>= parseOrError parseJSON >>= runTests
  where
    readJSON       = readFile "test/canonical-data.json"
    parseOrError p = either error pure . p
    parseJSON      = eitherDecodeStrict' >=> parseEither (parseTests parsers)
    runTests       = hspecWith defaultConfig {configFastFail = True}

    -- List of exercise-specific parsers
    parsers = [ parseResponse ]

-- | Exercise-specific parser for "response" tests.
parseResponse :: Value -> Parser Spec
parseResponse = withObject "response" $ \o -> do
    test        <- o    .: "response"
    description <- test .: "description"
    input       <- test .: "input"
    expected    <- test .: "expected"
    return $ it description $
                  assertEqual ("responseFor " ++ show input)
                    expected
                    (responseFor input)

-- | Exercise-independent JSON parser.
parseTests :: [Value -> Parser Spec] -> Value -> Parser Spec
parseTests ps = foldr (liftA2 (<|>)) mempty (parseGroup : ps)
  where
    parseGroup = withObject "group" $ \o -> do
        group       <- o     .: "group"
        description <- group .: "description"
        tests       <- group .: "tests"
        specs       <- withArray "tests" (traverse (parseTests ps) . toList) tests
        return . describe description . sequence_ $ specs

This is still experimental code, so don’t take it seriously, but note that only 12 lines of code are exercise-specific. All the others lines are exercise independent!

I avoided any trick to make this easier in Haskell, so the parsing is verbose and feels a little clumsy. Changing the JSON file would make parsing way easier, but that would favor the Haskell track in detriment of other languages and human-readability.

Final comments

Well, this is all I got for now…

I think that, if we decide to follow this path, in the short term we can expect to:

  • Automatically validate canonical-data.json files in Travis-CI.
  • Simplify test generators by sharing more code among exercises.

I deliberately avoided specifying inputs and outputs from the tests for a few reasons:

  • Nobody agrees about what they should be.
  • Data encoding is not so language neutral as people might think.
  • I believe we need more experience with test data standardization before jumping again in that discussion.

Anyone think it is an useful endeavor to standardize just that for now?

The reason I stopped commenting despite the fact that I’m the one who re-kindled this thread is that these replies really disheartened me:

I think you can get a good start on it for most languages, but that idea doesn’t take into consideration language call semantic differences (factor/forth vs assembly vs algol-based languages vs keyword arguments (smalltalk, ruby) as an example). … I think you can easily generate a rough draft for every exercise for a language, but it still needs to be reviewed, finalized, and styled by a human to be a good example to learn from.

…I know that it makes sense in some languages to think about automatically generating tests, but I belive that this is not a goal shared between all tracks. I think it is impossible, in the general case, to auto-magically generate the test suite…

Then what is the goal of this discussion about JSON format at all, if you’re not interested in programmatically processing the JSON data to generate the unit tests?

Moreover, I don’t see why language-specific differences matter here – my point was that totally disregarding ALGOL syntax and Ruby keyword arguments and Haskell data types, if everything is just a string you can write a generator to write out tests files (and example files too), and since there are already exercise-specific test generators, why not save yourselves the work and write a generic one with better-designed data? (Yes, you should still read and comment the output of the generator for good measure.)

I’ve been thinking about this a bit recently, and I think the most generalized version of this we can get might be the best for as many different needs as possible. What we’re really doing in most of these exercises is basically testing functions. There’s input, and there’s output. By trying to use keys in our JSON objects that are things like ā€œplaintextā€ and ā€œkeyā€, that’s creating a need for knowledge about the exercise to accurately understand how those parts interact.

I think if we can generalize on that concept of a function that we’re testing, that might be helpful both for human readability, and also for machine readability so we can possibly use this data for automatic tests.

So, here’s my example:

{
  "exercise": "cipher",
  "version": "0.1.0 or an object with more detailed information",
  "comments": [
    "Anything you can think of",
    "as a list of strings"
  ],
  "tests": [
    {
      "description": "encodes simple text",
      "function": "encode",
      "input": ["Secret message", "asdf1234"],
      "output": "qwertygh"
    },
    {
      "description": "encodes empty string",
      "function": "encode",
      "input": ["", "test1234"],
      "output": ""
    },
    {
      "description": "decodes simple string",
      "function": "decode",
      "input": ["qwertygh", "asdf1234"],
      "output": "Secret message"
    }
  ]
}

I don’t think there are any exercises that require anything other than input and output, but I haven’t done too deep of an analysis on that. I’d love any feedback if there are edge cases that would need to be taken care of here. I know that based on the structure above I can think of reasonable ways to parse that and automatically create some skeletons for tests in Ruby, Elixir, Go, JavaScript and Python, but that’s really all I can reasonably speak to since those are the only languages I have a decent amount of experience with.

Also, I sort of like the stripped down way of looking at this - when I look at that data I don’t need to know the context of the exercise to know what’s going on. I just know there’s a thing called encode, and that takes some input and returns some output, and there’s a text description of what’s going on.

I’m not really 100% sure that this would give us everything we want, but I wanted to at least throw this idea out there to get feedback and see if it might be a starting point for an actually good idea!

I do not see any sense in specifying order of arguments in the canonical testdata. There are different idioms and necessities in the various tracks.

Let’s assume we have some data type and we write functions around it. Let’s call it list. In object oriented languages it will be the object we call a method in so it will be completely out of the order of arguments. In elixir we like to have this object like argument at the first position to be able to pipe it around, while in Haskell it is preferred to have it last to be able to use point free style and partial application.

So as you can see order of arguments has to be specifies by the tracks maintainer a anyway.

Ryan Davis notifications@github.com schrieb am Mi., 21. Sep. 2016 23:47:

{
  "function": "repeat",
  "description": "tests failure",
  "input_count": -5,
  "input_string": "foo",
  "expected": { "error": "no negatives allowed" }
}

and what ensures the order of the args? There’s no metadata in place to declare argument names.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/exercism/x-common/issues/336#issuecomment-248752946, or mute the thread https://github.com/notifications/unsubscribe-auth/AADmR8LftAf2rePU2wUD_ZgXKqYCjrzbks5qsaYCgaJpZM4JjxYn .

I’d fully support a more generic structure which would make it unnecessary to have a generator for each exercise.

But I have to admit, I have no idea how it could look like. Since you already said you would change them, do you have an idea about the structure already @catb0t?

Also since it seems to be the right time, I want to request a feature for this generic format:

I had a sleepness night, of how I should handle changes in the canonical data as I wanted to have some versioning test. First I thought I’d could just use the date of the last change, but this would mean, that because of whitespace changes all earlier submissions would get ā€œinvalidatedā€. Therefore I think it would be a good idea to version the canonical data as well.