runtime: Bug in System.IO.Compression.Inflater or the native zlib dependency - invalid distance too far back.

This is related to the work I am doing to enable compression in WebSockets (here). I have written several tests that make sure the websocket works well with all supported window sizes. To test edge cases I also have created data, which when compressed results in bigger payload.

All tests work fine, except one and I isolated the issue to the following reproducible code:

using System;
using System.Buffers;
using System.IO.Compression;
using System.Reflection;

class Program
{
    static void Main()
    {
        const BindingFlags flags = BindingFlags.Instance | BindingFlags.NonPublic;

        // The data file is 65535 bytes, compressed (raw deflate) to 67022 (bigger).
        var dataStream = typeof(Program).Assembly.GetManifestResourceStream("Data.deflate");
        var constructor = typeof(DeflateStream).GetConstructor(flags, binder: null,
            types: new Type[] { typeof(Stream), typeof(CompressionMode), typeof(bool), typeof(int), typeof(long) }, modifiers: null);
        using var deflate = (DeflateStream)constructor.Invoke(new object[] { 
            dataStream, CompressionMode.Decompress, /*leaveOpen*/false, /*windowBits*/-10, /*uncompressedSize*/-1L });

        typeof(DeflateStream).GetField("_buffer", flags).SetValue(deflate, ArrayPool<byte>.Shared.Rent(512));

        var buffer = new byte[512];
        var count = 0;

        while (true)
        {
            var byteCount = deflate.Read(buffer);
            count += byteCount;

            if (count == ushort.MaxValue)
            {
                break;
            }
        }
    }
}

Ignore the reflection, in System.Net.WebSockets we have access to the Interop.zlib.cs, but it would not be easy to write code that shows the problem without a lot of boilerplate stuff.

The code in the example will throw an exception, and the native zlib error will be “invalid distance too far back”. If we change the size of the _buffer to 2048, the error disappears and the data is inflated without a problem.

The default buffer size for DeflateStream is 8KB and I so far have been unable to create data that would cause the same error, but nevertheless I think there is a bug somewhere.

In the WebSocket right now I have dynamic buffer size, depending on how big the user buffer is. I can easily use 8KB as minimum, but I think this would only hide the problem or make it rarer.

//cc @CarnaViire, @carlossanlop, @stephentoub

ConsoleApp.zip

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 20 (18 by maintainers)

Most upvoted comments

Here is a minimal repro with data captured from the testsuite for test case 428.

using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Reflection;

class Program
{
    static readonly ConstructorInfo Constructor = typeof(DeflateStream).GetConstructor(BindingFlags.Instance | BindingFlags.NonPublic, binder: null,
        types: new Type[] { typeof(Stream), typeof(CompressionMode), typeof(bool), typeof(int), typeof(long) }, modifiers: null);

    static void Main()
    {
        Working();
        Failing();
    }

    static void Working()
    {
        var memoryStream = new MemoryStream(Compress().SelectMany(x => x).ToArray());
        var deflate = (DeflateStream)Constructor.Invoke(new object[] {
                memoryStream, CompressionMode.Decompress, /*leaveOpen*/false, /*windowBits*/-9, /*uncompressedSize*/-1L });

        var buffer = new byte[1024];

        while (true)
        {
            var bytesRead = deflate.Read(buffer);
            if (bytesRead == 0)
            {
                break;
            }
        }

        Console.WriteLine("Works");
    }

    static void Failing()
    {
        var memoryStream = new MemoryStream();
        var deflate = (DeflateStream)Constructor.Invoke(new object[] {
                memoryStream, CompressionMode.Decompress, /*leaveOpen*/false, /*windowBits*/-9, /*uncompressedSize*/-1L });

        var buffer = new byte[1024];

        foreach (var segment in Compress())
        {
            memoryStream.Write(segment);
            memoryStream.Position -= segment.Length;

            deflate.Read(buffer);
        }

        Console.WriteLine("Should not see this...");
    }

    static List<byte[]> Compress()
    {
        var stream = new MemoryStream();
        var result = new List<byte[]>();

        using var inflator = (DeflateStream)Constructor.Invoke(new object[] {
                stream, CompressionMode.Compress, /*leaveOpen*/true, /*windowBits*/-9, /*uncompressedSize*/-1L });

        var reader = new StringReader(Data);
        var line = reader.ReadLine();

        while (line is not null)
        {
            inflator.Write(Convert.FromHexString(line));
            inflator.Flush();

            result.Add(stream.ToArray());
            stream.SetLength(0);

            line = reader.ReadLine();
        }

        return result;
    }

    const string Data =
@"7B0A202020224175746F6261686E5079
74686F6E2F302E362E30223A207B0A20
202020202022312E312E31223A207B0A
20202020202020202022626568617669
6F72223A20224F4B222C0A2020202020
20202020226265686176696F72436C6F
7365223A20224F4B222C0A2020202020
20202020226475726174696F6E223A20
322C0A2020202020202020202272656D
6F7465436C6F7365436F6465223A2031
3030302C0A2020202020202020202272
65706F727466696C65223A2022617574
6F6261686E707974686F6E5F305F365F
305F636173655F315F315F312E6A736F
6E220A2020202020207D2C0A20202020
202022312E312E32223A207B0A202020
202020202020226265686176696F7222
3A20224F4B222C0A2020202020202020
20226265686176696F72436C6F736522
3A20224F4B222C0A2020202020202020
20226475726174696F6E223A20322C0A
2020202020202020202272656D6F7465
436C6F7365436F6465223A2031303030
2C0A202020202020202020227265706F
727466696C65223A20226175746F6261
686E707974686F6E5F305F365F305F63
6173655F315F315F322E6A736F6E220A
2020202020207D2C0A20202020202022
312E312E33223A207B0A202020202020
202020226265686176696F72223A2022
4F4B222C0A2020202020202020202262
65686176696F72436C6F7365223A2022
4F4B222C0A2020202020202020202264
75726174696F6E223A20322C0A202020
2020202020202272656D6F7465436C6F
7365436F6465223A20313030302C0A20
2020202020202020227265706F727466
696C65223A20226175746F6261686E70
7974686F6E5F305F365F305F63617365
5F315F315F332E6A736F6E220A202020
2020207D2C0A20202020202022312E31
2E34223A207B0A202020202020202020
226265686176696F72223A20224F4B22
2C0A2020202020202020202262656861
76696F72436C6F7365223A20224F4B22
2C0A2020202020202020202264757261
74696F6E223A20322C0A202020202020";
}

The difference between the working and the failing method is that in the first I am reading everything at once. In the second I am reading line by line.