runtime: System.IO.Packaging.Package should have a way to handle common errors with malformed packages
System.IO.Packaging.Package provides a simple abstraction over a container to hold multiple objects in a single entity. This is commonly used in nupkg packages, vsix extensions, appx, and also Office documents. The package format makes heavy use of URIs to maintain relationships between various components. However, Office will allow a user to generate an invalid URI for one of these relationships, and then opening via System.IO.Packaging fails with no easy way of recovery besides manually updating the file, which necessitates a deeper understanding of the file format that should be abstracted by the library.
This is to work around a breaking change that occurred between 4.0->4.5. This proposal is to provide a way to work around that change.
Example
There are multiple issues filed on the OpenXml SDK project (which abstracts the Office format to strongly typed classes) and related projects where users have been hit by this. For example:
- https://github.com/OfficeDev/Open-XML-SDK/issues/38
- https://github.com/ClosedXML/ClosedXML/issues/249
A simple Office document that will fail can be created by the following:
- Create a new Word document
- Add a hyperlink to an invalid URI (ie
mailto:one@) - Try to load with
Package:
using (var package = Package.Open(pathToDoc))
{
foreach (var part in package.GetParts())
{
part.GetRelationships();
}
}
This will fail with the following exception:
System.UriFormatException
HResult=0x80131537
Message=Invalid URI: The hostname could not be parsed.
Source=System.Private.Uri
StackTrace:
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at System.Uri..ctor(String uriString, UriKind uriKind)
at System.IO.Packaging.InternalRelationshipCollection.ProcessRelationshipAttributes(XmlCompatibilityReader reader)
at System.IO.Packaging.InternalRelationshipCollection.ParseRelationshipPart(PackagePart part)
at System.IO.Packaging.InternalRelationshipCollection..ctor(Package package, PackagePart part)
at System.IO.Packaging.PackagePart.EnsureRelationships()
at System.IO.Packaging.PackagePart.GetRelationshipsHelper(String filterString)
at PackagingExample.Program.Main(String[] args) in c:\users\tasou\source\repos\PackagingExample\PackagingExample\Program.cs:line 15
Proposed API
namespace System.IO.Packaging
{
public abstract class Package // Existing class
{
public static Package Open(string path, FileShare packageShare, PackageSettings settings);
public static Package Open(string path, PackageSettings settings);
public static Package Open(Stream stream, PackageSettings settings);
}
public class PackageSettings
{
public FileMode PackageMode { get; set; }
public FileAccess PackageAccess { get; set; }
public IUriProvider UriProvider { get; set; }
}
public interface IUriProvider
{
Uri CreateUri(string uriString, UriKind uriKind);
}
}
Details
- The
PackageSettingsclass would allow for easily adding additional settings at a later point without adding more factory methods (i.e.Package.Open(...)). Since theFileModeandFileAccessproperties are used on all factory methods, these are added to the settings object. - The
Package.Open(...)methods listed extend the current factory methods that take aStreamor a path with an optionalFileShare. - The default implementations of
IUriProviderwould essentially be this line: https://source.dot.net/#System.IO.Packaging/System/IO/Packaging/InternalRelationshipCollection.cs,363
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 9
- Comments: 17 (15 by maintainers)
Commits related to this issue
- Add interface that we can custom the exception https://github.com/OfficeDev/Open-XML-SDK/issues/715 #983 #24002 #26084 — committed to dotnet-campus/runtime by lindexi 4 years ago
@karelz How can I get this proposal approved? I’m willing to implement it - many users of the Open XML SDK would appreciate this
Regarding
Uriclass:Uriclass was born. As a result, our goal is to drive it towards maintainability (using more of safe code and simpler code).Uriparsing, compliant with RFC instead of lax parsing (as suggested here).In summary, I would strongly urge against making
Urimore lax in parsing (pre-.NET 4.5 times), or complicate the code significantly. On the other hand, there is slowly growing demand for non-compliantUrirepresentation and I wonder if we could achieve that via other ways – either the suggestedIUriFactory, or support custom parsing, and/or custom error handling / fallback parsing in theUriclass itself as an advanced extension mechanism. Higher-level libraries (e.g. System.IO.Packaging) would have to then pass through the extension mechanism to allow such customization by the callers (OpenXML in this case). @MihaZupan do you have any thoughts on this?Thanks @ericstj. I’ve added a tool into the OpenXml library to handle sanitizing it (https://github.com/OfficeDev/Open-XML-SDK/pull/793). Downside is that this requires rewriting the document, which may have only been read only before. However, it is interesting that the format itself appears to handle malformed Uris just fine and don’t have the restrictions that System.Uri has. ie Word/Excel/etc happily generate files with these values.
The roundtripping here is so that the values can be rewritten so the document can be read without modifying anything. OriginalString can’t be overwritten to output the original, malformed Uri, so a derived type could be used to ensure the roundtripping. This would allow the OpenXml library to transparently handle the change so people don’t have to modify (potentially old) LOB apps to handle this case.
Having a way to parse a Uri with a relaxed mode similar to pre-.NET 4.5 would be ideal, so hopefully @dotnet/ncl can give some feedback for that.
@JeremyKuhne I’ve taken another look at this since OpenXml users have been hitting it. After playing around with usage, I’ve updated my proposal to the following new APIs for System.IO.Packaging. I’m happy to take this throw the API proposal process, but haven’t done that before, so any guidance would be helpful.
The
PackageSettingsclass is to minimize the number of overloads ofOpen(...)that need to be added and allow for simpler additions in the future.This would allow someone to intercept the creation and writing of the Uri instances so that they can do whatever they want with it. For example, here are some test cases using it:
@carlossanlop Any thoughts on this API? I haven’t done an API review, but happy to help move something like this along. As more users are moving to .NET Core/.NET 5, this will become a problem for many LOB apps that rely on DocumentFormat.OpenXml.