Open-XML-SDK: WordprocessingDocument.Open is very slow
Description
WordprocessingDocument.Open
is very very slow when reading big .docx document.
i’m trying to read 10 mb sized .docx document and it takes about 1 minute just to open it.
Information
- .NET Target: .NET Core 2.2
- DocumentFormat.OpenXml Version: 2.9.0
Repro
Console.WriteLine("Creating filter")
using (var doc = WordprocessingDocument.Open(path, false))
{
Console.WriteLine("Creating BodyReader");
_bodyReader = OpenXmlReader.Create(doc.MainDocumentPart.Document);
}
Link to the file: https://drive.google.com/file/d/1_InQLbZ19KCUgkuePAiLXvUuLcZl6Qu7/view?usp=sharing
Uploaded to GitHub: 10mb_file.docx
Observed
I put to lines of Console.WriteLine
so the time between “Creating filter” and “Creating BodyReader” is about 1 min. It doesn’t matter if i opening file from memory stream or just giving it a real path to the file.
Expected
Instant open expected 😃
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (8 by maintainers)
I’m going to open as I recently made a change to System.IO.Packaging that may help this and I want to verify. See https://github.com/dotnet/runtime/pull/35978
FYI the fix that I got into System.IO.Packaging fixes this!
Before:
After:
Since the package is still in preview, we won’t be upgrading it for the project at this time. Note, since the major version of this is changing, we won’t actually be able to bring it into the repo until v3.0 (due to semantic versioning). That’ll probably happen sometime soon, but not for a bit. However, you may manually bring in 5.0.0-preview.6.20305.6+ of the package to get the benefit (although it will only help you if you are on .NET Core)
Thank you. I’m new to C# but I will try my best.
ah that is a good observation. I’ll have to see if there’s a way to lazily load those (I’ve never actually explored that scenario)
Here’s a benchmark version of it:
And the results:
I agree that seems like a lot of time. I’ll look into it.