Open-XML-SDK: WordprocessingDocument.Open is very slow

Description

WordprocessingDocument.Open is very very slow when reading big .docx document. i’m trying to read 10 mb sized .docx document and it takes about 1 minute just to open it.

Information

  • .NET Target: .NET Core 2.2
  • DocumentFormat.OpenXml Version: 2.9.0

Repro

Console.WriteLine("Creating filter")
using (var doc = WordprocessingDocument.Open(path, false))
{
        Console.WriteLine("Creating BodyReader");
        _bodyReader = OpenXmlReader.Create(doc.MainDocumentPart.Document);
}

Link to the file: https://drive.google.com/file/d/1_InQLbZ19KCUgkuePAiLXvUuLcZl6Qu7/view?usp=sharing

Uploaded to GitHub: 10mb_file.docx

Observed

I put to lines of Console.WriteLine so the time between “Creating filter” and “Creating BodyReader” is about 1 min. It doesn’t matter if i opening file from memory stream or just giving it a real path to the file.

Expected

Instant open expected 😃

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I’m going to open as I recently made a change to System.IO.Packaging that may help this and I want to verify. See https://github.com/dotnet/runtime/pull/35978

FYI the fix that I got into System.IO.Packaging fixes this!

Before:

| Method |    Mean |    Error |   StdDev |
|------- |--------:|---------:|---------:|
|   Open | 9.167 s | 0.1791 s | 0.2265 s |

After:

| Method |     Mean |   Error |  StdDev |
|------- |---------:|--------:|--------:|
|   Open | 184.5 ms | 3.64 ms | 4.85 ms |

Since the package is still in preview, we won’t be upgrading it for the project at this time. Note, since the major version of this is changing, we won’t actually be able to bring it into the repo until v3.0 (due to semantic versioning). That’ll probably happen sometime soon, but not for a bit. However, you may manually bring in 5.0.0-preview.6.20305.6+ of the package to get the benefit (although it will only help you if you are on .NET Core)

Thank you. I’m new to C# but I will try my best.

ah that is a good observation. I’ll have to see if there’s a way to lazily load those (I’ve never actually explored that scenario)

Here’s a benchmark version of it:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using DocumentFormat.OpenXml.Packaging;

namespace OpenSample
{
    [CoreJob]
    public class OpenBenchmark
    {
        [Benchmark]
        public WordprocessingDocument Open()
        {
            var path = ...;
            return WordprocessingDocument.Open(path, false);
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<OpenBenchmark>();
        }
    }
}

And the results:

| Method |    Mean |    Error |   StdDev |
|------- |--------:|---------:|---------:|
|   Open | 21.56 s | 0.3080 s | 0.2730 s |

I agree that seems like a lot of time. I’ll look into it.