runtime: System.Linq.Except violates Linq lazy iteration paradigm

The second Enumerable is always completely iterated and not lazy iterated!

See the current implementation of ExceptIterator:

Set<TSource> set = new Set<TSource>(comparer);
set.UnionWith(second);

foreach (TSource element in first)
{
    if (set.Add(element))
    {
        yield return element;
    }
}

to fix the implementation it should look something like this (of my mind & untested):

if(comparer == null) comparer = EqualityComparer<TSource>.Default;
Set<TSource> set = new Set<TSource(comparer);

using(IEnumerator<TSource> e2 = second.GetEnumerator())
{
    foreach(TSource element = first)
    {
        if(set.Contains(element)) continue;

        while(e2.MoveNext() && !comparer.Equals(element, e2.Current))
        {
            set.Add(e2.Current);
        }

        if(set.Add(element)) yield return element;
    }
}

As an offtopic humble side note request: Please add Memoize()-functionality to .Net because Linq-starters don’t know the importance of this concept, that it even exists and that they actually need it.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I don’t think users expect any enumeration on the second-Enumerable if first-Enumerable is empty.

I buy that: I’m surprised the current implementation doesn’t do this already.

We generally try not to optimize for corner cases too eagerly, since they always incur some form of cost in the more common code paths.

I don’t think Linq has ever promised to lazy-iterate all inputs in all cases? OrderBy, GroupBy, etc, all iterate an input in one go (although they defer that until the first element is requested). I can’t see any advantage in Except incrementally iterating one of its inputs, especially as doing so is more expensive.