runtime: "A local file header is corrupt" error after upgrading to 3.0/3.1

We have an application that reads and process information from zip files. After we upgraded target framework to 3.0 (or 3.1) from 2.1 without upgrading the code, processing of larger zip files started to fail with: System.IO.InvalidDataException: A local file header is corrupt.

This issue blocks us from upgrading to dotnet 3.1 and we need to upgrade asap to resolve the out of memory issues 2.1 has while running in container. I spent some time narrowing down the issue and here’s what I found…

Sample code to repro the issue is:

using System;
using System.IO;
using System.IO.Compression;

namespace ZipProblemRepro
{
    class Program
    {
        private const string ZipPath = "F:/Test/bad_header/java_1_file.zip";
        
        public static void Main(string[] args)
        {
            Console.WriteLine($"Starting to read '{ZipPath}'...");

            using (var zip = ZipFile.Open(ZipPath, ZipArchiveMode.Read))
            {
                var linesProcessed = 0L;
                
                foreach (var fileEntry in zip.Entries)
                {
                    using (var stream = fileEntry.Open())
                    {
                        linesProcessed += ReadLines(stream);
                    }
                }

                Console.WriteLine($"Processed {linesProcessed} lines");
            }
            
            Console.WriteLine("Completed successfully!");
        }
        
        private static int ReadLines(Stream stream)
        {
            var linesRead = 0;
            using (var reader = new StreamReader(stream))
            {
                string line;
                while (!reader.EndOfStream)
                {
                    try
                    {
                        line = reader.ReadLine();
                    } catch (OutOfMemoryException ex)
                    {
                        line = null;
                    }
                    
                    ++linesRead;
                }
            }

            return linesRead;
        }
    }
}

While running this code on the same zip file, it succeeds when target framework is netcoreapp2.1, but fails when targeting netcoreapp3.0 or netcoreapp3.1. All tests were performed on the same machine with .NET Core SDK 3.1.100

The hardest part was to produce a brand new zip to repro the problem. Turned out that 2 things are required to get a repro zip:

Zip should contain a large file in it
Zip should be created with Java zip library. This sounds weird, but this is the only way I was able to repro this. Tried different settings for 7z, Windows Explorer compression and command line zip, but no luck with any of those.

I am including the code I used to generate repro zip below, but I can also supply this zip if you tell me where to upload it (315MB).

C# code I used to generate repro file:

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

namespace ZipGenerator
{
    class Program
    {
        private const string TempDirPath = "Temp";
        
        private const int GuildsPerLine = 10;
        private const int LinesPerFile = 12000000;
        private const int FilesPerDir = 1;
        private const int DirsTotal = 1;
        
        public static void Main(string[] args)
        {
            Console.WriteLine("Generating...");

            Directory.CreateDirectory(TempDirPath);
            for (var dirCount = 0; dirCount < DirsTotal; dirCount++)
            {
                if (dirCount % 1 == 0)
                {
                    Console.WriteLine($"Generated {dirCount} out of {DirsTotal} directories so far...");
                }
                
                var dirName = Guid.NewGuid().ToString("N");
                var dirPath = Path.Combine(TempDirPath, dirName);
                Directory.CreateDirectory(dirPath);

                for (var fileCount = 0; fileCount < FilesPerDir; fileCount++)
                {
                    var fileName = Guid.NewGuid().ToString("N");
                    var filePath = Path.Combine(dirPath, fileName);
                    GenerateEasierToCompressFileWithGuids(filePath);
                }
            }
            
            Console.WriteLine("Files created!");
        }
        
        private static void GenerateEasierToCompressFileWithGuids(string filePath)
        {
            using (var file = new StreamWriter(filePath))
            {
                for (var lineCount = 0; lineCount < LinesPerFile; lineCount++)
                {
                    var sb = new StringBuilder();
                    var fixedGuidPerLine = Guid.NewGuid().ToString("N");

                    for (var guidInLineCount = 0; guidInLineCount < GuildsPerLine; guidInLineCount++)
                    {
                        sb.Append(fixedGuidPerLine);
                    }

                    file.WriteLine(sb.ToString());
                }
            }
        }
    }
}

Java code to zip the file:

package com.test.zipper;

import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.zip.Deflater;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

public class Main {
    private static final String dirToZip = "D:\\VS\\Sandbox\\ZipProblemRepro\\Temp";

    public static void main(String[] args) throws IOException {
        String outputFile = "F:\\Test\\bad_header\\javaOutput.zip";

        System.out.println("Zipping everything in '" + dirToZip + "'");

        FileOutputStream fos = new FileOutputStream(outputFile);
        BufferedOutputStream bos =  new BufferedOutputStream(fos);
        ZipOutputStream zipOutputStream = new ZipOutputStream(bos);
        zipOutputStream.setLevel(Deflater.BEST_SPEED);

        Files.walk(Paths.get(dirToZip))
                .filter(Files::isRegularFile)
                .forEach((path) -> ZipFile(zipOutputStream, path));

        System.out.println("Done zipping. Closing everything...");

        zipOutputStream.close();
        bos.close();
        fos.close();

        System.out.println("Done!");
    }

    private static void ZipFile(ZipOutputStream zipOutputStream, Path path) {
        try {
            String fileToZip = path.toAbsolutePath().toString();
            String pathInZip = fileToZip.substring(dirToZip.length() + 1);
            FileInputStream fis = new FileInputStream(fileToZip);
            ZipEntry zipEntry = new ZipEntry(pathInZip);
            zipOutputStream.putNextEntry(zipEntry);
            byte[] bytes = new byte[1024];
            int length;
            while((length = fis.read(bytes)) >= 0) {
                zipOutputStream.write(bytes, 0, length);
            }
            zipOutputStream.closeEntry();
        } catch (Exception ex) {
            throw new RuntimeException("Failed while zipping. Exception was: " + ex.getMessage());
        }
    }
}

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 31 (17 by maintainers)

Commits related to this issue

Read unsigned Int32 in ZipLocalFileHeader When reading a non-64 bit header for a file that is bigger than int.MaxValue, but less than uint.MaxValue length, it is possible to get a negative "uncompres... — committed to eerhardt/runtime by eerhardt 4 years ago
Add zip file that contains a file with length between int.MaxValue and uint.MaxValue. This file reproduces the problem in https://github.com/dotnet/runtime/issues/1094. — committed to eerhardt/runtime-assets by eerhardt 4 years ago
Read unsigned Int32 in ZipLocalFileHeader When reading a non-64 bit header for a file that is bigger than int.MaxValue, but less than uint.MaxValue length, it is possible to get a negative "uncompres... — committed to eerhardt/runtime by eerhardt 4 years ago
Add zip file that contains a file with length between int.MaxValue and uint.MaxValue. (#64) This file reproduces the problem in https://github.com/dotnet/runtime/issues/1094. — committed to dotnet/runtime-assets by eerhardt 4 years ago

Most upvoted comments

Hello,

I just wanted to share an ugly workaround for those who needs to stay on 3.1 and can’t wait for the release of this fix. Instead of calling :

 firstEntry.Open();

Call this private OpenInReadMode method using reflection but disabling the header validation :

 firstEntry.GetType()
   .GetMethod("OpenInReadMode", BindingFlags.NonPublic | BindingFlags.Instance)
   .Invoke(firstEntry, new object[] {false}) as Stream;

jonathanantoine on Apr 7, 2020

I was able to repro the original issue posted and debugged the problem.

From what I can tell, the error is here:

https://github.com/dotnet/runtime/blob/6a0dafb7f8fa5ac3531be56dc803c3ae92e49201/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs#L519-L520

In the originally posted issue, the uncompressed file is larger than int.MaxValue, but smaller than uint.MaxValue. Thus, java is writing the .zip file as 32-bit and we are reading the values as signed 32-bit integers. In this case, the value is larger than int.MaxValue, so when read and converted to a long, the value becomes negative. Later in the method, we compare uncompressedSize to the entry.Length, and that fails.

I’ve pushed a branch with the fix applied: https://github.com/eerhardt/runtime/tree/Fix1094. This allows the original file to be read successfully. I’ll add a test to that branch and then make a PR for the fix. We also should service 3.1 with the fix, as other people can hit this problem as well.

/cc @buyaa-n

@MikeCodesDotNET - I was also able to reproduce the same exception message with the file you sent me Clay Paky@Golden Scan HPE@justatest (1).gdtf. However, even with the above fix, the exception still occurs. The is because the values read from the stream aren’t matching:

I’m unsure of what your error is - it may actually be a corrupt .zip file. Can you open a new issue for your scenario?

For the user.zip file you sent me, I was unable to repro an exception.

eerhardt on Feb 4, 2020

The current plan is to introduce an AppContext switch into both 3.1 and 5.0 that will turn the new validation off.

We have removed local header validation vs central directory, therefore no need AppContext switch. Decompressed stream will still be restricted by uncompressed size.

Fix merged to 5.0, reopening for porting to 3. 1

buyaa-n on Feb 21, 2020

Looking at the implementation that used to be in the dotnet/wpf repo, they used a more forgiving algorithm for reading the data descriptor. Maybe using the same algorithm would work for us. I believe it would fix @d0tn3tc0d3r’s scenario where some data descriptors have 4 byte lengths and some have 8 byte lengths.

Looking at the spec https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT, the rule seems ambiguous to me:

 4.3.9.2 When compressing files, compressed and uncompressed sizes 
 SHOULD be stored in ZIP64 format (as 8 byte values) when a 
 file's size exceeds 0xFFFFFFFF.   However ZIP64 format MAY be 
 used regardless of the size of a file.  When extracting, if 
 the zip64 extended information extra field is present for 
 the file the compressed and uncompressed sizes will be 8
 byte values.

The When extracting, if the zip64 extended information extra field is present for the file the compressed and uncompressed sizes will be 8 byte values. section is the key here. I could interpret it at least 2 ways:

If there are any Zip64 extended information extra fields, then the sizes will be 8 byte values
(the way @d0tn3tc0d3r’s zip appears to be created) – If there are Zip64 extended information extra fields for either compressed size or uncompressed size, then the size will be 8 byte values.

An example of how these are different is that the relative offset of local header field may be present in the ZIP64 extended information extra fields, but not the compressed or uncompressed sizes. In @d0tn3tc0d3r’s zip, when this occurs the data descriptor lengths are still 4 bytes.

I assume that is why the dotnet/wpf implementation tries reading it each way and comparing the values with the central directory entry to figure out which way it was written.

eerhardt on Feb 7, 2020

@eerhardt Do you think we’ll get an API / the ability to gracefully handle invalid headers or should I focus my efforts on finding a workaround?

The current plan is to introduce an AppContext switch into both 3.1 and 5.0 that will turn the new validation off. So in your application, if you set this setting, it will allow ZipArchive to load even if the headers are invalid, basically falling back to .NET Core 2.1 behavior.

Along with that switch, we are also fixing the bug mentioned above - changing ReadInt32 to ReadUInt32.

eerhardt on Feb 6, 2020

@d0tn3tc0d3r

The issue in the zip files that I have is that some files are flagged as Zip64 (version needed to extract = 45 in the local file header), but the data descriptor contains 32 bit sizes (and are less than 4 GiB in size). So the check whether the size in the data descriptor matches the one in the central directory record fails.

I think that issue should be fixed too, @eerhardt could you add fix for that with your Uint32 fix? or i can work on the fix CC @ericstj

so not sure why the extra check has been implemented in the first place.

We had issue ZipArchive extracting a tampered zip file which Uncompressed size in central directory is much bigger that real Uncompressed size https://github.com/dotnet/runtime/issues/27741

buyaa-n on Feb 5, 2020

Though, It’d be nice to have an option to extract without validation.

I was thinking something similar actually. Your zip file does load successfully in .NET Core 2.1, but with the extra validation we added in 3.1, it now fails.

@buyaa-n @ericstj @carlossanlop @ahsonkhan @ViktorHofer - thoughts? It would probably require adding a new public API to disable the validation - unless we wanted to “quirks”-mode it, but that wouldn’t be my vote. I can propose something in a new issue, if people think it would be valuable.

eerhardt on Feb 4, 2020

Yep. I’ll take a look tomorrow.

eerhardt on Feb 2, 2020