runtime: System.IO.Packaging part stream has a memory leak when writing a large stream

This was found through the DocumentFormat.OpenXML library which uses System.IO.Packaging extensively (original issue: https://github.com/OfficeDev/Open-XML-SDK/issues/244). The issue logged there is trying to generate a large Excel document which uses a working set of around 10mb on .NET 4.7, while it grows quite quickly until hitting a OutOfMemoryException. I’ve simplified the issue to remove the dependency on DocumentFormat.OpenXML and it appears to be isolated to writing to a Part within a Package.

Source

using System;
using System.IO;
using System.IO.Packaging;

namespace MemoryRepro
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var fs = new FileStream(Path.GetTempFileName(), FileMode.Create, FileAccess.ReadWrite))
            using (var package = Package.Open(fs, FileMode.Create))
            {
                var part = package.CreatePart(new Uri("/part", UriKind.Relative), "something/sometype");

                using (var stream = part.GetStream())
                using (var writer = new StreamWriter(stream))
                {
                    for (var i = 0; i < int.MaxValue; i++)
                    {
                        writer.Write("hello");
                    }
                }
            }
        }
    }
}

Project

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net47</TargetFramework>
    <!--<TargetFramework>netcoreapp2.0</TargetFramework>-->
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="System.IO.Packaging" Version="4.4.0" />
  </ItemGroup>

</Project>

This repro code appears to have a working set of around 60mb running on .NET 4.7, while it grows very quickly on .NET Core 2.0

The error on .NET Core 2.0 is:

Unhandled Exception: System.IO.IOException: Stream was too long.
   at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.Compression.WrappedStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.IO.StreamWriter.Write(String value)
   at MemoryRepro.Program.Main(String[] args) in c:\users\tasou\source\repos\MemoryRepro\MemoryRepro\Program.cs:line 21

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 33
  • Comments: 25 (17 by maintainers)

Most upvoted comments

Hello. Almost a year has passed but the bug seems not going to be fixed. This error blocks using of .NET Core in our reporting solution.

@ianhays Is this (and the zip streaming feature) being tracked for .NET Core 2.2?

I took a closer look at this. It’s very much related to the ZipArchiveEntry behavior mentioned here; https://github.com/dotnet/corefx/issues/11669#issuecomment-468016815 Also mentioned by @twsouthwick above.

When opening a Package with ReadWrite access the underlying archive is opened in Update mode which causes all entries to be buffered completely to a MemoryStream. The MemoryStream has an upper limit of int.MaxValue, so in addition to causing this to use a lot more memory than it needs, it also means that the upper limit of entries it can deal with is int.MaxValue.

When opening a Package with FileAccess.Read or FileAccess.Write you won’t hit the case where ZipArchiveEntry stores uncompressed data in memory. This can permit you to work with Packages with large files: only open them for FileAccess.Read or FileAccess.Write.

Today there is a bug in the .NETCore implementation of Package which blocks the use of FileAccess.Write. I have a fix for that which I’ll submit shortly which should unblock the issue pointed out in the original posting here.

Although S.IO.Packaging on desktop does define “streaming” support, that’s not actually at play here as far as I can tell. One thing that is at play is that on desktop the System.IO.Packaging implementation of zip had a fancier stream for the update case. It behaved in a similar way to ZipArchiveEntry where updates would decompress the entire entry for access, but it would back that decompressed stream in a mix of memory+file. https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Packaging/CompressStream.cs,716

There’s still the scenario of a package opened for Update that needs to Read / Write large files. For that, let’s let issue dotnet/runtime#1544 track the improvement to ZipArchiveEntry to permit it and dotnet/corefx#31362 track using that in System.IO.Packaging.

catch this bug too! In .net core the DocX library for working with .doc files does not work correctly The issue was opened half a year ago and there are no changes =(

Sorry about that.

  • ZipArchvie is behaving the same between .NET Framework and .NET Core. The ZipArchive snippet is used in the .NET Core implementation of System.IO.Packaging; I don’t know what is used in the .NET Framework implementation since it’s in WindowsBase.
  • System.IO.Packaging is behaving differently between .NET Framework and .NET Core. On .NET Framework, the original code snippet works; I can create a package and write an arbitrarily large number of items to it. On .NET Core, the original snippet will crash due to memory usage.