roc: Soundness bug in tag union pattern matching inference

Here’s something that type-checks today:

exclaimFoo : [ Foo Str, Bar ] -> [ Foo Str ]*
exclaimFoo = \tag ->
    when tag is
        Foo str -> Foo (Str.concat str "!")
        Bar -> Foo "hi"

If I remove the type annotation, the inferred type of the argument will be [ Foo Str, Bar ]* - which is incorrect (it should be a closed union, like in the above annotation) and which seems like a soundness bug. It would mean I could pass Blah to this function and it would type-check even though there’s no Blah pattern to match.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 36 (36 by maintainers)

Commits related to this issue

(WIP) First pass at tag presence constraints (#1758) This work is related to restricting tag union sizes in input positions. As an example, for something like ``` \x -> when x is A M -> X A ... — committed to roc-lang/roc by ayazhafiz 3 years ago
(WIP) First pass at tag presence constraints (#1758) This work is related to restricting tag union sizes in input positions. As an example, for something like ``` \x -> when x is A M -> X A ... — committed to roc-lang/roc by ayazhafiz 3 years ago
Presence constraints for tag union types This work is related to restricting tag union sizes in input positions. As an example, for something like ``` \x -> when x is A M -> X A N -> X A... — committed to roc-lang/roc by ayazhafiz 3 years ago
Presence constraints for tag union types This work is related to restricting tag union sizes in input positions. As an example, for something like ``` \x -> when x is A M -> X A N -> X A... — committed to roc-lang/roc by ayazhafiz 3 years ago
Presence constraints for tag union types This work is related to restricting tag union sizes in input positions. As an example, for something like ``` \x -> when x is A M -> X A N -> X A... — committed to roc-lang/roc by ayazhafiz 3 years ago
Presence constraints for tag union types This work is related to restricting tag union sizes in input positions. As an example, for something like ``` \x -> when x is A M -> X A N -> X A... — committed to roc-lang/roc by ayazhafiz 3 years ago

Most upvoted comments

Assigning to myself! Fwiw, one interesting perspective is to see the current behavior as the dual of exhaustiveness checking - rather than making sure “all the cases are covered”, the current behavior says that “at least the cases you defined are covered”. Definitely a bug though

ayazhafiz on Nov 10, 2021

I haven’t forgotten about this, but the last week has been unusually busy with work and the Thanksgiving holiday in the US. I hope to have something up soon.

ayazhafiz on Nov 28, 2021

Gotcha - yeah, I think let’s default to proceeding without the “new kind of type that is the dual to row/extension types” and just know that it’s an option in the future if a sufficiently compelling use case comes up!

rtfeldman on Nov 24, 2021

That said, I’m curious to learn more about what you discovered! What’s the paper called?

Extensible Programming with First-Class Cases and Programming with Polymorphic Variants

But it will require introducing a new kind of type that is the dual to row/extension types.

Does that mean a user-facing type? Or just something behind the scenes?

It could be behind the scenes, though it’s one of those things that would be leaky - if someone does something tricky with the type system and doesn’t understand why it doesn’t work (or does work), we may have to say “oh yeah, this is how the type system is implemented”

ayazhafiz on Nov 24, 2021

I’ll give a prototype a shot, I’ll send you a PR when something is ready! May take a few days to think through it though, and I think it’s a good idea to just start with the simple (non-nested) case first.

ayazhafiz on Nov 21, 2021

I do think it’s at least worth prototyping this idea in the compiler, even if we are missing something more 😃

Sounds like a plan! 🙌

Let me know if you want to pair on it, or if you want to take a shot at a prototype, I’d also be happy to review some code!

rtfeldman on Nov 21, 2021

I’m not sure 😃 but its relative simplicity suggests to me that it’s probably the right direction, at least moreso than some of the other options we have.

Another thing that gives me some confidence is that a set of “might-have”/“must-have” constraints is a model of a constraint optimization problem on a lattice, so like for example if the set of all possible types for some variable we’re trying to solve lives in 2D space, then “must-have” constraints limit it “from the top” and “might-have” limit it “from the bottom”.

[]*    (top)


must-have: A, B, C, *
--

<all possible types of x>

--
might-have: D, E, ...


[]     (bottom)

The solver finds their join (the upper limit of the constraints), subject to ensuring that they are compatible with each other. In this sense, I think? we are guaranteed to find a principal type, when it exists. For individual variables I actually think it’s even better - since roc’s variables are immutable (on the surface) and bindings under a let are generalized, a single variable can only ever have either all “must-have” or all “might-have” constraints - not a mix both (I might be missing something though - let me know if I am). This means that we only have to check for “must-have”/“might-have” discrepancies when we’re comparing two different types we’ve already solved.

Anyway, I do think it’s at least worth prototyping this idea in the compiler, even if we are missing something more 😃

ayazhafiz on Nov 21, 2021

This is a tough problem haha 😃

ayazhafiz on Nov 20, 2021