runtime: ImageSharp broken on .NET 5

https://github.com/SixLabors/ImageSharp/issues/1356 was raised as ImageSharp being broken on .NET 5 RC 1.

Although it is still not clear if this is an issue with dotnet/runtime or maybe something wrong with the project (such as incorrect build targets/dependencies), this quantifies as something we would classify as blocking-release if it is indeed an issue coming from the runtime or libraries themselves.

This is being opened as a tracking issue until we can finish root causing the issue and determine what here is broken.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 65 (57 by maintainers)

Most upvoted comments

I managed to get the repro on Linux. In order to do that I created a Hyper-V VM with one CPU and loaded an Ubuntu Live image.

I can confirm that the issue there was fixed with RC2. Attached are the result of Manish’s repro with .NET 5.0 RC1 and RC2:

5.0.100-rc.1.20454.5 garden-rc1 5.0.100-rc.2.20480.7 garden-rc2

git bisect reports the same. b1f107e6f13252644bbff443670eb4ec540aad86 is good and f6b31c21e1369a387cd49e82d4d6e8998b5172ed is bad.

The assert goes away as does the image corruption.

From the PR it isnt very clear why it would be consistent failure only on single proc. Is there an explanation?

The AVX register is meant to be ECX, bit 28. One of the queries was checking EDX, bit 28 by accident.

EDX, bit 28 is the bit that indicates Multi-Threading support: image

A value of 0 for HTT indicates there is only a single logical processor in the package and software should assume only a single APIC ID is reserved. A value of 1 for HTT indicates the value in CPUID.1.EBX[23:16] (the Maximum number of addressable IDs for logical processors in this package) is valid for the package.

Thanks @echesakovMSFT and @tannergooding for investigating! @JimBobSquarePants @tocsoft @brianpopow, if you could also please validate we can close this out. .net 5 RC2 should be releasing in two weeks. Thx!

To double check this I will run ImageSharp repro with 5.0.100 RC2

5.0.0-rc.2.20475.5 works as expected - no image corruption

The issues do look to be unrelated, I can’t repro #41108 on the 1-core VM.

However, I got an even simpler repro for this issue and it does look to be the upper 16-bytes getting corrupted somewhere:

var f = new Vector<float>(1.0f);
Console.WriteLine(f);

will print something like:

<1, 1, 1, 1, 0, 0, 1.7832655E+23, 4.5908E-41>

when it should print:

<1, 1, 1, 1, 1, 1, 1, 1>

@mangod9: me and @antonfirsov have tested with the current dotnet master branch and can confirm that the issue is fixed.

Thanks to all for investigating this issue!

The issue seems to be related to ExtendedIntrinsics.ByteToNormalizedFloatReduce.

With the following change to SimdUtils.cs the issue does not reproduce:

index 7f917648d..8b6938e0b 100644
--- a/src/ImageSharp/Common/Helpers/SimdUtils.cs
+++ b/src/ImageSharp/Common/Helpers/SimdUtils.cs
@@ -81,7 +81,7 @@ internal static void ByteToNormalizedFloat(ReadOnlySpan<byte> source, Span<float
             DebugGuard.IsTrue(source.Length == dest.Length, nameof(source), "Input spans must be of same length!");

 #if SUPPORTS_EXTENDED_INTRINSICS
-            ExtendedIntrinsics.ByteToNormalizedFloatReduce(ref source, ref dest);
+            //ExtendedIntrinsics.ByteToNormalizedFloatReduce(ref source, ref dest);
 #else
             BasicIntrinsics256.ByteToNormalizedFloatReduce(ref source, ref dest);
 #endif

@tannergooding I would be surprised if this was System.Drawing issue as ImageSharp doesn’t use it, right?

CC. @danmosemsft, @jeffschwMSFT, @richlander

Going to assign this to myself and will try to work with James to ensure we can resolve the issue.

I think it could be b1f107e6f13252644bbff443670eb4ec540aad86 that made the difference - I reverted this commit on top of release/5.0 (3c6e6cc) and re-building/trying to repro the issue.

@tocsoft I thought @brianpopow said it was all decoders?

SixLabors/ImageSharp#1356 (comment)

That’s great news if you have narrowed it down to there though!

I think there are two different maybe related issues:

  1. Just decode an jpg image and save it: This will result in this green stripes pattern. Decoding and saving the image works for PNG, BMP, TGA and GIF. Decoding and saving GIF seems to have another issue. The issue with GIF images seems to be in the encoder, not in the decoder. The image seems to be yellowish.

  2. Decoding an image (no matter which one) and mutating it will result in half the image is missing, like reported from the thread starter of Issue1356. So far i can verify that resizing will have this effect and EdgeDetection.

There is a relatively simple repro now: repro.zip

You can extract the above zip and run it on a machine with a single core (such as can be configured in Hyper-V) and it will result in the outputs being corrupted.

CC. @danmosemsft, @jeffschwMSFT, @jkotas

@jeffhandley since this is still in flight I am not sure which label makes the most sense. I added meta just as a placeholder.

Yeah, I don’t think this is area-System.Drawing (CC. @jeffhandley).

I don’t think there is a good area we can categorize this as other than just “tracking-external-issue”, at least until we can determine if it is actually a runtime issue or not.

The latest comments indicate (https://github.com/SixLabors/ImageSharp/issues/1356#issuecomment-701605488):

Just to re-iterate, I only see the issue when deploying a release configuration to an Azure App Service from Visual Studio 2019 with a self-contained .Net 5 RC-1 runtime being installed. And only when the App Service is rather limited in memory. Is there anyway we can use that exact setup to recreate the issue?

great, thanks for confirming. Will close this now 😃

@mangod9 is there a docker image of .net 5 RC2 available? That would be the easiest way for me to reproduce this.

To double check this I will run ImageSharp repro with 5.0.100 RC2

Now that the GC may be sometime using AVX registers due to https://github.com/dotnet/runtime/tree/master/src/coreclr/src/gc/vxsort, this latent bug may be causing actual corruptions.

We were able to narrow it further by running ImageSharp unit tests. I can confirm Vector < T > is root cause somewhere in ImageSharp uint32 -> float conversion pipeline. Can provide more info on Monday.

Was tiering also ruled out?

It repro’s under Debug for me. The only requirement I found was that you have a 1 core machine for it to consistently repro.

@tannergooding Please reach out to me if you need any help investigating this or there is even a small chance it is interop related.