proposal-pattern-matching: Iteration result caching seems problematic

“consumes the entire iterator” behavior is incompatible with plucking the head from an infinite or a finite but very large iterator, and “cached for the lifetime of the overarching match construct” is going to have confusing interaction with reference reuse and/or dynamic mutation:

const pathological = {
  // `pathological.a` is an iterator whose result items have a method
  // that replaces `pathological.a`.
  a: (function* g(label) {
    const v = {
      allow(context) {
        console.log(`${label} ${context} method`)
        pathological.a = g("replacement");
        return false;
      }
    };
    console.log(`${label} index 0`); yield v;
    console.log(`${label} index 1`); yield v;
    console.log(`${label} done`);
  })("init"),

  // `pathological.b` is a getter that returns `pathological.a`.
  get b() { console.log("get b"); return this.a; }
};
// `pathological.c` is the initial value of `pathological.a`.
pathological.c = pathological.a;

match (pathological) {
  // `next()` is invoked at least once.
  when ({ a: [solo] }) if (false) { … }

  // Is `b` recognized as the same iterator as `a`?
  when ({ b: [solo] }) if (solo.allow("solo")) { … }
  when ({ b: [head, ...tail] }) if (head.allow("head")) { … }

  // Are properties re-read?
  when ({ a: [solo] }) { … }
  when ({ b: [solo] }) if (false) { … }

  // Is `c` recognized as the same as the *old* value of `a`?
  when ({ c: [head, ...tail] }) { … }
  else { console.log("no match"); }
}

// What got logged?

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 77 (54 by maintainers)

Most upvoted comments

I disagree; as it currently stands it is impossible to use infinite iterators with array patterns, not just awkward or inconvenient. They are guaranteed to either deadlock or not match. Adding ... as a possible rest pattern does not inconvience any other use-case, but makes infinite iterators possible.

It also helps with non-infinite cases. For finite iterators, you’re trapped in the same dilemma: either the iterator has to be an exact length match (against [a, b, c]), or you’re required to exhaust the entire iterator (against [a, b, c, ...rest]). There’s no way to pull off N values and allow additional values without consuming the entire iterator. [a, b, c, ...] would allow this.

This is a general problem caused by the mismatch in behavior of [a, b, c] between destructuring and pattern matching, and affects several cases.

tabatkins on Dec 8, 2021

Cool. Caching semantics are now described for array patterns and object patterns

tabatkins on Dec 10, 2021

No, the length-checking itself is a very important functionality that most (all?) pattern-matching constructs use, and absolutely should not be removed. The length of an array is a significant thing people intuitively match against; without it, you have to either add guards doing the length check manually (ugh) or be careful about ordering your clauses so the longer ones come first and the end patterns won’t match against undefined (huge footgun).

This is just one of the places where the mental model and common usage of destructuring and of pattern matching differ.

tabatkins on Dec 8, 2021

i strongly disagree that each arm should separately invoke @@iterator. that operation is not, even in idiomatic usage, idempotent.

devsnek on Aug 3, 2021

Phew, y’all’s weird metaprogramming is fun to debug. But sure, I get what’s going on here.

Yes, in the current proposal’s caching semantics, the first match clause will exhaust the iterator, caching {matchable: [matchable, 0, 1]} in the array pattern cache, and the second match clause will cache {(matchable, “value”): 2} in the object pattern cache, performing a Get against “value” because that Get hadn’t been done in a cache-observable way before.

(This isn’t particularly intentional in its specifics, but I think it’s reasonable.)

tabatkins on Dec 14, 2021

In both cases the maximum size is the number of distinct possible Gets described by all the patterns in total - every key in each object pattern, and every array pattern itself (the iteration cache is identical both ways, but how we cache the iterator itself that we obtained from the matchable varies). Whether you’re constructing the keys identifying those Gets from object identity or path identity doesn’t change the worst-case count.

tabatkins on Dec 13, 2021

@Jack-Works if I understand right (and me trying to articulate this is also for the sake of getting confirmation about whether I’ve interpreted it correctly myself), that comment is saying that the @@iterator method of an iterable-that-is-not-itself-an-iterator is expected to be “reusable” in the sense that it would be unusual (even though not impossible) for it to yield different values if iterated twice consecutively. All built-in iterables-that-are-not-themselves-iterators* adhere to that behavior in a clean env, and it seems reasonable to say that’s how any such objects “should” behave, so if iterable matching behavior only accounts for this specific pattern, it can still be useful.

Obv @ljharb please correct me if I’m still wrong there.

* I think this is what folks may be using “built-in iterable” as a kind of shorthand for, which is part of what threw me off, though admittedly iterable-that-is-not-itself-also-an-iterator is a mouthful! It doesn’t actually have to be built-in, it’s just that the built-ins provide exemplars of the pattern that would be supported regardless of what ends up happening here.

bathos on Dec 9, 2021

Yeah, you’re overthinking this. ^_^ There is no brand-checking, branching on anything, or anything else of the sort, we’re just talking about the consequences of how iterables and iterators work.

If we fail to get the caching semantics to work, then whenever you try to match something against an array matcher, it’ll ask the matchable for an iterator. Arrays/etc return a fresh iterator each time you do so, so the matchers work as expected if you have to test against multiple arms. Generator objects just return themselves, so the matchers will not work as expected, because the second time you try to match against an array pattern, the first few items will have already been consumed by the previous one.

Here’s a specific example:

function* range(start, end) {
 for(var i = start; i < end; i++) yield i;
}
class Sequence {
  constructor(start, end) { this.start = start; this.end = end; }
  [Symbol.iterator]() { return range(this.start, this.end); }
}
const iteratorNums = range(1, 6);
const iterableNums = new Sequence(1, 6);
const arrayNums = [1,2,3,4,5];

match(arrayNums) {
  when ([a, b]) 
	console.log(a, b); 
	// doesn't match, because the array is length 5
  when ([c, d, ...rest]) 
	console.log(c, d, rest);
	// matches, logging (1, 2, [3, 4, 5]);
}

match(iteratorNums) {
  when ([a, b]) 
	console.log(a, b); 
	// doesn't match, because the iterator still has items
    // but did consume three items off of the iterator
  when ([c, d, ...rest]) 
	console.log(c, d, rest);
	// matches, logging (4, 5, [])
    // because the previous clause consumed the first three items
}

match(iterableNums) {
  when ([a, b]) 
	console.log(a, b); 
	// doesn't match, because the iterator still has items
  when ([c, d, ...rest]) 
	console.log(c, d, rest);
	// matches, logging (1, 2, [3, 4, 5]);
    // gets the same result as Array because, like Array,
    // each call to Symbol.iterator returns a fresh iterator
}

Ideally, of course, we get caching semantics, and all three of them have the same result, and the same observable behavior overall, with each element of the iterator requested exactly once.

tabatkins on Dec 9, 2021

Yes, an iterator has to attempt to consume five items to correctly match an [a,b,c,d] pattern, to ensure that the fifth attempt fails and the iterator represents exactly four elements. The current spec (well, the README) asserts that you must capture the leftover elements with a spread if you want to do “four or more”; we haven’t specified exactly what will happen if you try to match [a,b,c,d, ...rest] against an infinite iterator.

The “helpful” behavior would be to bind rest to the iterator itself (having been progressed by four items), but that gets… complex, with our intended iterator caching semantics. Instead, I suspect the answer is that we have to consume the entire iterator, hanging on infinite iterators.

I suspect we want to add a bare ... pattern to indicate pure optionality. This is potentially useful for matching against plain arrays too (you won’t create a superfluous binding for the rest of the array if you don’t want it), but it also means that [a,b,c,d, ...] only has to consume four items from an iterator to confirm a match, and works well for infinite iterators.

tabatkins on Dec 8, 2021

I also don’t see how structural vs identity keying leads to particularly important differences here. I agree that structural more closely matches how one would “cache” (by using temp vars) a hand-rolled version of the match construct, but the difference just doesn’t seem significant. You’ve got to have somewhat weird code for the difference to show up in the first place (in your example, the bar and baz properties of the matchable have to point to the same object to see an observable difference), and exceptionally weird code for the difference to matter.

I went with identity keying because it’s the easiest to describe. If impls give feedback that structural keying would would be simpler for them internally, I’ve got no problem switching to that; it just takes a little bit more work to describe. It could allow engines to do less caching, at the cost of some up-front analysis.

tabatkins on Dec 11, 2021

The fourth clause, when ({ a: [solo] }) { … }, Gets pathological.a (now different from the result in the first clause)

I believe our intention is that this is not the case - the first clause will have caused us to cache the result of getting pathological.a, and we’ll reuse that here. Getters shouldn’t be invoked twice by pattern matching in normal circumstances, so long as the path to them looks the same across clauses.

In other words, the specific caching we’re thinking of is:

a {matchable => (iterator, [items so far]) cache, for matchables matched against array patterns
a {(matchable, property name) => item} cache, for matchables matched against object patterns

Thus, retrieving the same key from an object multiple times should result in only a single Get, and also presenting the same object to an array pattern multiple times, even if obtained via different paths, should only fetch the iterator once, and only result in one iteration thru that iterator. You can observe exactly when the iterator is pulled from, and fiddle with side effects to make it weird, but absent those we believe these semantics should Just Work.

(Unfortunately, your example is also too meta-programmy for me to easily grasp what’s happening in it. It seems to be trying to exercise several different corners of the behavior at once.)

Is this an accurate summary of your thoughts, @ljharb?

tabatkins on Dec 10, 2021

@gibson042 … i think so? it’s really hard to wrap my head around such a complex scenario without writing a test and having an implementation to test it against, but that sounds right to me.

It’s important we lock down the semantics so that even this kind of pathological code has deterministic behavior, but I don’t think it’s important what that behavior actually is, as long as it falls out of the semantics one would expect from actual/common code.

ljharb on Dec 10, 2021

Would not x be the actual iterator? I’m not matching it against a [] pattern, so it shouldn’t try to peek into that iterator at all. I probably could have written that second clause like this instead, and it would do the same:

when (x) if (x.next() === 'Not this') console.log('This will not match')

theScottyJam on Dec 10, 2021

The only possible behavior there is the same thing as destructuring - it deadlocks on an infinite iterator. If we’d want to handle infinite iterators better, we’d need to do so in destructuring and maybe for-of, and then it would work in pattern matching automatically. We should not try to solve this problem as part of pattern matching.

ljharb on Dec 8, 2021

I think the best that can be done, as the proposal currently stands, is to use the take() method from the iterator-helpers proposal. e.g.

match (Number.range(0, Infinity).take(4)) {
    when ([a, b, c, d]) ...
    else ...
}

This can’t cover all use cases, and it’s not pretty, but it should help with a good number of them.

theScottyJam on Dec 8, 2021

A question. If the current match semantics requires the number of items also matches, does it mean it’s impossible to match 4 items from an infinite iterator?

match (Number.range(0, Infinity)) {
    when ([a, b, c, d]) -> can I match here?
    else -> ...
}

Jack-Works on Dec 8, 2021

Actually, side effects on iteration are just a special kind of side effects. You can also have other side effects by matching on a Proxy (it will invoke a proxy trap). So I guess the side effects here are acceptable.

Jack-Works on Aug 4, 2021

Object destructuring will likewise consume the whole iterable

It’s actually more complicated… array destructuring “IteratorCloses” the iterable without consuming it (which is actually necessary to destructure infinite iterators), by invoking return (which defaults to terminating iteration for built-in-constructed iterators including iterators returned from generators, but can be arbitrarily overridden). However, that is irrelevant for arrays, which return a fresh iterator every time one is requested—and therefore your Object.keys example works as expected if the when clauses are independent of each other.

I believe the caching is not only avoidable, but should be avoided. Regardless, if it is a part of this proposal, then there’s a whole lot to address, including what exactly is cached, by what key, and how the cache responds to reference reuse and/or dynamic mutation… and the result is definitely not going to be broadly intuitive. Supporting speculative destructuring to accommodate non-array-like iterators is weird, and will add lots of unnecessary complexity.

gibson042 on Aug 2, 2021

This behavior is patterned after object destructuring. Object destructuring will likewise consume the whole iterable, whether or not you use the whole thing.

> function* f() { yield 1; yield 2; yield 3 }
> g = f()
> const [a] = g
> a
1
> g.next()
{ value: undefined, done: true }

If we want to stay consistent and provide the least surprising behavior possible, we need to also consume the whole iterable during pattern matching, which means we need to cache the result, so it can be used in each match arm.

Otherwise, you can’t do this:

const data = ({ a: 1 })
match (Object.keys(data)) {
  when ([key1, key2]) { ... }
  when ([key1]) { ... } // We wanted this to match, but it won't, because the iterator got consumed on the first match arm.
  when ([]) { ... } // Whoops! This will be the one that matches
}

theScottyJam on Aug 2, 2021

I was quoting the current README: Array/iterable destructuring (emphasis mine)

These contain a comma-separated list of zero or more patterns, possibly ending in rest syntax (like ...rest).

This pattern first verifies that the matched value is iterable, then obtains and consumes the entire iterator.

gibson042 on Aug 2, 2021