runtime: [New rule] Catch that foreach will throw InvalidCastException

I’m creating an analyzer (https://github.com/dotnet/roslyn-analyzers/issues/4479) that reports if foreach will perform explicit cast in runtime (which is likely result in runtime exception):

image

This issue is to ask approval to set RuleLevel for that analyzer as BuildWarning.

  • If explicit cast is not intended, user should definitely be warned about potential runtime exception
  • If explicit cast is intended, user should do it explicitly, so that intent is clear.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 27 (23 by maintainers)

Most upvoted comments

If there is case where we can know that something is always a runtime error then I agree that it is an excellent case for a potential analyzer. Unfortunately, I don’t see how this analyzer can ever be definitive in its recommendations.

As @stephentoub says the compiler already warns for cases that we know will cause runtime failures:

using System;
using System.Collections.Generic;

public class C {
    public void M() {
        var list = new List<string>() { "Hello World" };
        foreach (int i in list) // error CS0030: Cannot convert type 'string' to 'int'
        {
            Console.WriteLine(i);
        }
    }
}

My understanding of what is being proposed is that we now warn for this case as well:

using System;
using System.Collections.Generic;

public class C {
    public void M() {
        var list = new List<object>() { "Hello World" };
        foreach (int i in list) // New warning about casting object to int
        {
            Console.WriteLine(i);
        }
    }
}

I can understand the implicit cast in foreach can be surprising but that’s how the language works. Whenever you see foreach (int value in list) you should think for (int i = 0, value = 0; i < list.Count; value=(int)list[i], i++). If you do not want that to happen you can use var which will type the value as whatever it actually is instead of attempting to cast it.

You could require this pattern in your codebase so all casts are explicit:

using System;
using System.Collections.Generic;

public class C {
    public void M() {
        var list = new List<object>() { "Hello World" };
        foreach (var value in list) 
        {
            int i = (int)value;
        }
    }
}

Or this:

using System;
using System.Collections.Generic;
using System.Linq;

public class C {
    public void M() {
        var list = new List<object>() { "Hello World" };
        foreach (var i in list.Cast<int>()) 
        {
            
        }
    }
}

But this really becomes a style argument about what the best way to express to readers that you intend the cast that is happening.

safeguards users from potential runtime issues

This same argument could be made for warning about all explicit casts. The developer may be new to C# and be unaware that the cast will cause runtime errors. My understanding of the problem as it happened is:

start with code like this

List<Student> People {get;}

// some levels of object-oriented layering later...

foreach(Student student in People)
{
	...
}

We then add a base type like say Person that Student inherits from and update our collection to contain the base type

List<Person> People {get;}

// some levels of object-oriented layering later...

foreach(Student student in People)
{
	...
}

the compiler doesn’t warn because this cast could succeed. There may even be paths were nothing happens and ones where there is a runtime exception. While I am sympathetic, I do not see a universal way to address this problem today. If I am an analyzer, the developer needs to communicate to me what their intent is. There are millions of lines of code that do this today and I shouldn’t warn them just because there exists some situations where this could be non-intentional.

If, in my codebase, this sort of refactoring is happening all the time I would personally use var in a lot of places. I might even recommend var be used everywhere. That was one of the reasons var became so common in the IDE codebase of roslyn, because we kept refactoring different layers and we didn’t want to deal with the fallout of updating thousands of variable declaration locations.

However, these sorts of refactors never happen at the compiler layer. For that reason the roslyn compiler layer does not use var at all.

This is the reason the style rules for https://github.com/dotnet/roslyn are what they are and why I think this analysis is best expressed in something like stylecop.

Is there a guide somewhere to the philosophy of analyzers? I can see correctness-style analyzers falling under three general categories.

  1. The code you’ve written is going to fail at runtime. Don’t even bother hitting F5. (See existing error code CS0030.)
  2. The code you’ve written may succeed at runtime or it mail fail at runtime. Practically speaking, it will always* go down the “success” path or it will always go down the “failure” path, but we can’t really tell from static analysis which world we’re in. If you hit F5 you’ll find out quickly which side you’re on.
  3. The code you’ve written may succeed or fail at runtime, but it will do so non-deterministically, or the variables which affect success vs. failure may be environment-dependent and very difficult to reason about. Your production server might observe different behaviors than your test server.

My opinion (and just my opinion) is that these are also in increasing order of importance for an analyzer to flag. [3] is the most important because it’s alerting you to a problem that you might not even know about until your prod server falls over unexpectedly. [1] and [2] are less important. We should flag them if we can, but if we miss something it’s not the end of the world, as the developer will immediately see any runtime failure as soon as they exercise this code path.

For [2], I’d rather err on the side of false negatives instead of false positives. Other posters in this thread described their concerns behind a false positive rate. On the other hand, the worst consequence of a false negative is that you see the failure the first time you run your app, and you spend a minute or two debugging it. That doesn’t seem all that bad, to be honest.

* I’m using “always” a bit liberally. That is, if your test cases all populate the List<object> with strings, and you assume this is representative of a typical scenario, then you “always” go down the success path for typical inputs. And if you happen to receive an atypical input, your implicit cast acts as a quasi Debug.Assert("I really expected this to be a string and haven't tested this code for other data types."). Score one for invariant checking! 😃