runtime: XPS documents in .NET Core can't be opened

Description

As adviced from /runtime/wpf I should open a ticket here for the dotnet/runtime team…
Please see https://github.com/dotnet/wpf/issues/3546 for full details, test documents and many more.

.NET Framework 4.8 is able to open XPS files from printer driver containing many .piece files.
.NET 5 ( / .NET Core) is unable to open XPS files from printer driver containing many .piece files.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 2
  • Comments: 24 (10 by maintainers)

Most upvoted comments

A mostly untested (I’ve got it to compile, but that is about it) prototype port of the interleaving feature can be found at: https://github.com/dotnet/runtime/compare/main...KevinCathcart:interleave

The adjustments to use ZipArchiveEntry were mostly straightforward. The biggest differences are the .Name for ZipFileInfo is .FullName in ZipArchiveEntry, and deletion is done via the entry, rather than the zip archive, and lastly the use of the ZipStreamManager. There were also a few small changes to account for nullability annotations, and some stylistic things that this repo has flagged as errors.

I’ve not tried do any big changes like avoiding seeking for reads (I suspect the .NET Framework Version may simply have failed to handle zip files that big). In any case, multi gigabyte XPS files are probably a low priority to support, and XPS files are the only use of OPC that I am aware of that uses interleaving. I’m pretty sure the office open XML spec forbids interleaving, and obviously nuget’s spec also forbids the use of interleaving (since the current nuget client cannot handle it).

Hope this helps.

When porting System.IO.Packaging over to .NET Core, two features interrelated features were dropped: Streaming mode (which allows creation of interleaved parts), and support for reading/updating existing interleaved parts.

The reason they were dropped was because the porter was under instruction to replatform the the code to use System.IO.Compression, instead of MS.Internal.IO.Zip. Since at the time, a major goal was still to keep the runtime size small by avoiding duplicated code, this made sense.

However, part of the support for interleaved files (which is specific to Open Packaging Conventions) was implemented in the MS.Internal.IO.Zip namespace, and therefore was not ported, even though it could have been moved into to Packaging namespace where I would argue it should have been in the first place, and updated to be based on System.IO.Compression, just like the rest of the OPC specific code.

This issue is basically requesting to restore the missing support for reading and updating interleaved files. The read and update support actually has no public API surface (unlike streaming), and its absence is a correctness issue, since support for reading interleaved OPC files is not optional in the specification. On the other hand, the ability to produce such files is optional, since interleaving is only meant as a possible performance enhancement, so restoring streaming mode is less important.

It appears that all the needed code has been released under MIT, either as part of the initial commit of the code to this repository (where unfortunately, it had already been heavily modified from the original, so hopefully no needed code was removed, but it looks like most was just ifdefed out), or for the MS.Internal.IO.Zip code needed, was released as part of open sourcing WPF, although the code was deleted in later commits due to not being used by anything over there.

It looks to me like “Piece” support is just a wrapper over top of the directory structure of the ZIP to create an aggregate stream that hides the underlying file parts from the zip: ZipPackagePart.cs (microsoft.com) (location in NETFx that decides to use the abstraction) ZipPackagePart.cs (location in .NETCore that would need to handle this).

Here ZipFileInfo in netfx is analogous to ZipArchiveEntry in netcore. A null entry indicates that it might be a piece-file. Here’s the read path: ZipPackage.cs (microsoft.com) (netfx) ZipPackage.cs (netcore)

I can’t see anything in the implementation of piece support that looks particularly hard but I haven’t dug too deep. I see it uses seekable zip entry streams which means it won’t work with our read-only archives (necessary for large zips) but that’s a limitation – not a blocker. I might be wrong but I bet we could fix that limitation as it looks like the seeking could even be redundant. I tend to agree that it looks like it was accidentally excluded because of the namespace. The devil may be in the details but I don’t expect it would be to hard to get to a prototype by adding in the missing code and mapping old concepts to new. I think we’d be willing to accept a contribution here that takes the InterleavedZipPartStream.cs and related classes from .NETFramework and ports them to System.IO.Packaging.

The first step here would be to move that code over, porting from the WPF-Zip types to the IO.Compression types and see if there are any difficult dependencies. If not then I think we could accept a PR that added that along with tests.

I believe the status is that this isn’t currently scheduled, but we would take a contribution as you described if offered?

Yes, at the moment this isn’t something that the @dotnet/area-system-io-compression team is working on but a contribution would be considered. We are approaching a place in .NET 7.0 where we won’t be able to take risky changes and are already past the place for new API. It is still possible to get no-API changes in net7.0 but the @dotnet/area-system-io-compression team would need to make the call once seeing how risky it is.

@danmoseley I hope I understand and answer your questions correctly. (I’m not a native English speaker)

reproduction code can be found here: https://github.com/dotnet/wpf/issues/3546#issuecomment-695942720

sample documents can be found here: https://github.com/dotnet/wpf/issues/3546#issuecomment-824713800

the reason why it can’t be opened can be found here: https://github.com/dotnet/wpf/issues/3546#issuecomment-825061189

One of our products is based on/working with the Microsoft V4 XPS print driver. This Microsoft print driver (Microsoft owned code) generate XPS files (XPS documents, the print jobs) using .piece files.

.NET Framework is able to open these XPS documents (print jobs). .NET (Core) is unable to open these XPS documents.

Why? Credits to @ThomasGoulet73 to figuring out that a part of .NET Framework code to support .piece files in System.IO.Packaging.* was not ported.

So we have a product that worked fine in .NET Framework but doesn’t work fine anymore in .NET (Core). We got it working, by a big workaround to first extracting the XPS files, combining all the part files into a single big XPS file and then recreate an XPS file again to open it in .NET (Core) again. As you understand this is terrible, slow and expensive method in a printing solution.

I hope this answered the question “Could this be achieved by using the code in a non Microsoft owned project?”.

“Is there a way to get a sense of the level of interest in this feature, such that it would justify the cost of maintenance etc?”

To begin with, everybody prints, but not everybody writes printing code or is in the printing industry. It is a niche, as you probably know. Also, I don’t think that many companies have already ported all their ‘printing solutions’ from .NET Framework to .NET Core. (or the solutions they provide are rather services, at a later stage in the document handling process after printing). However when this (and other XPS issues will be fixed), each time a few thousand people will benefit from it, from our customers worldwide. (Working together with the biggest print multinational…) So maybe fixing this issue might be interesting for only a few developers but in the end thousands of customers will be happy. Developers working on the core of printing or on print solutions are in a minority, but that doesn’t mean issues should be left open. We have customers that print on two-weekly/monthly base 150-200k files, for them it would be super beneficial, but also for the other thousand of companies/users.

Personally: If customers buy e.g. production machines, you know they will be working with it in reality for the next 10 to 15 years. Meaning we can’t keep supporting them with ‘old’ .NET Framework products, when we provide a .NET (Core) alternative product, it should be the same as .NET Framework or better. Now with hacky workarounds it is worse. In the end we or the customer give Microsoft a bad name “sorry, Microsoft problem, they won’t help and don’t want to fix it”.

P.S.: As you have noticed we tried to wake up the WPF team at Microsoft and I believe the community has spoken, as you have seen in the latest discussions ( https://github.com/dotnet/wpf/discussions/6542 ). One of my personal reasons is that XPS is a good clean format, but as with WPF/XAML/XPS there is a lack of updates/fixes/improvements. I really hope when somebody looks at this XPS issue, additional improves to performance will be done also, for example: https://github.com/dotnet/runtime/issues/51930 Many people print or work with XPS documents on a daily base, this will also come in handy for a lot of users/customers. (+ The biggest culprit with XPS is the requirement for the STA thread, I have still some hopes in a dark corner but yeah…probably false hope)

https://github.com/dotnet/runtime/issues?q=XPS https://github.com/dotnet/wpf/issues?q=XPS https://github.com/dotnet/winforms/issues?q=XPS https://github.com/mono/SkiaSharp/issues?q=XPS

I really hope Microsoft will spend some time on porting this .NET Framework code to .NET Core (and maybe some additional XPS fixes/improvements). Our customers and our team would be so thankful.