Open-XML-SDK: One illegal uri in Relationship will destroy the document parsing
Description
The PPTX document that include an illegal uri in Relationship will make the System.IO.Packaging.InternalRelationshipCollection.ProcessRelationshipAttributes
throw an exception to OpenXmlPart.Load
.
And the OpenXmlPart.Load
can not catch the exception and it will break the PresentationDocument.Open
.
Information
- .NET Target: All
- DocumentFormat.OpenXml Version: 2.10.1
Repro
var document = PresentationDocument.Open("hyperlink.pptx", isEditable: false, openSettings)
Here is the hyperlink.pptx file : https://1drv.ms/p/s!AiKjiQqRWKThlv5zkY4HoRvvJ3Ppdg?e=3kfdNU
Observed
The PresentationDocument.Open throw the UriFormatException exception
System.UriFormatException: 'Invalid URI: The hostname could not be parsed.'
Because the ppt\slides_rels\slide1.xml.rels contain this string
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="mailto:!@#$%^&*()_+}{:”?><,./;’[]=-098766554321" TargetMode="External"/>
As you can see, the Target
is not an uri.
Expected
We can design an exception handle API, and we can handle some illegal document.
And the #298 only add more information but can not tolerate errors.
And just as @twsouthwick says, we can not fix this in the OpenXML SDK project https://github.com/OfficeDev/Open-XML-SDK/issues/297#issuecomment-345560540 , but I think we can tolerate some errors
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 29 (21 by maintainers)
Commits related to this issue
- Add interface that we can custom the exception https://github.com/OfficeDev/Open-XML-SDK/issues/715 #983 #24002 #26084 — committed to dotnet-campus/runtime by lindexi 4 years ago
See #1295 for the package abstraction being proposed
For anyone interested, I’ve opened a PR to explore an API that performs functionality similar to what @abelykh0 discusses and linked to in stackoverflow
I believe we can solve this in a better way similar to https://github.com/OfficeDev/Open-XML-SDK/issues/807#issuecomment-1377733580:
(1) Override IPackageFeature to identify malformed relationships (2) replace them with an identifier that we can track (3) Override IPackage/IPackagePart APIs to return the original bad URI (4) Override Save to rewrite the replaced identifier with the original (if nothing changed)
** If the package is read only, we can copy it behind the scenes and operate on a copy of it so that we can manage these illegal uris
** This will be easy for File/Stream overloads… not so easy for Package
@twsouthwick Thank you and I write the CompatiblePackage.cs which can ignore some error. And I publish our application with the CompatiblePackage.cs three months ago, and I find that more than half of the problems were fixed.
See https://github.com/dotnet-campus/DocumentFormat.OpenXml.Extensions/blob/96313c16095be677ea36650a9dd0bcbcd57cf168/src/DocumentFormat.OpenXml.Flatten/DocumentFormat.OpenXml.Flatten/Compatibilities/Packaging/CompatiblePackage.cs
I’m going to reopen this to see if we can fix it better for a v3.0.0. I’m thinking we will go with @lindexi suggestion of creating an abstraction of package we can control more. I believe we can then handle things a bit better.
Sorry, please delete this file, it cannot be posted on a public place.
Considering Excel opens such files with no warnings and even “Open XML SDK 2.5 Productivity Tool” reports no warnings, I would consider this issue critical.