Open-XML-SDK: One illegal uri in Relationship will destroy the document parsing

Description

The PPTX document that include an illegal uri in Relationship will make the System.IO.Packaging.InternalRelationshipCollection.ProcessRelationshipAttributes throw an exception to OpenXmlPart.Load.

And the OpenXmlPart.Load can not catch the exception and it will break the PresentationDocument.Open.

Information

  • .NET Target: All
  • DocumentFormat.OpenXml Version: 2.10.1

Repro

var document = PresentationDocument.Open("hyperlink.pptx", isEditable: false, openSettings)

Here is the hyperlink.pptx file : https://1drv.ms/p/s!AiKjiQqRWKThlv5zkY4HoRvvJ3Ppdg?e=3kfdNU

Observed

The PresentationDocument.Open throw the UriFormatException exception

System.UriFormatException: 'Invalid URI: The hostname could not be parsed.'

Because the ppt\slides_rels\slide1.xml.rels contain this string

<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="mailto:!@#$%^&amp;*()_+}{:”?&gt;&lt;,./;’[]=-098766554321" TargetMode="External"/>

As you can see, the Target is not an uri.

Expected

We can design an exception handle API, and we can handle some illegal document.

See #38 #274 #297 #298

And the #298 only add more information but can not tolerate errors.

And just as @twsouthwick says, we can not fix this in the OpenXML SDK project https://github.com/OfficeDev/Open-XML-SDK/issues/297#issuecomment-345560540 , but I think we can tolerate some errors

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 29 (21 by maintainers)

Commits related to this issue

Most upvoted comments

private void OpenDocument()
{
    // TODO remove this workaround once the issue is fixed
    // https://github.com/OfficeDev/Open-XML-SDK/issues/715

    SpreadsheetDocument result = null;
    try
    {
        result = SpreadsheetDocument.Open(this._fileStream, false);
    }
    catch (OpenXmlPackageException e)
    {
        if (e.ToString().Contains("Invalid Hyperlink", StringComparison.Ordinal))
        {
            var stream = new MemoryStream();
            try
            {
                this._fileStream.Position = 0;
                this._fileStream.CopyTo(stream);
                this._fileStream.Dispose();
                this._fileStream = stream;
                this.FixInvalidUri(this._fileStream, brokenUri =>
                { return new Uri("http://broken-link/"); });
            }
            catch
            {
                stream.Dispose();
                throw;
            }

            result = SpreadsheetDocument.Open(this._fileStream, false);
        }
    }

    this._spreadsheetDocument = result;
}

private void FixInvalidUri(Stream fs, Func<string, Uri> invalidUriHandler)
{
    XNamespace relNs = "http://schemas.openxmlformats.org/package/2006/relationships";
    using (ZipArchive za = new ZipArchive(fs, ZipArchiveMode.Update, true))
    {
        foreach (var entry in za.Entries.ToList())
        {
            if (!entry.Name.EndsWith(".rels", StringComparison.Ordinal))
            {
                continue;
            }

            bool replaceEntry = false;
            XDocument entryXDoc = null;
            using (var entryStream = entry.Open())
            {
                try
                {
                    entryXDoc = XDocument.Load(entryStream);
                    if (entryXDoc.Root != null && entryXDoc.Root.Name.Namespace == relNs)
                    {
                        var urisToCheck = entryXDoc
                            .Descendants(relNs + "Relationship")
                            .Where(r => r.Attribute("TargetMode") != null && (string)r.Attribute("TargetMode") == "External");
                        foreach (var rel in urisToCheck)
                        {
                            var target = (string)rel.Attribute("Target");
                            if (target != null)
                            {
                                try
                                {
                                    Uri uri = new Uri(target);
                                }
                                catch (UriFormatException)
                                {
                                    Uri newUri = invalidUriHandler(target);
                                    rel.Attribute("Target").Value = newUri.ToString();
                                    replaceEntry = true;
                                }
                            }
                        }
                    }
                }
                catch (XmlException)
                {
                    continue;
                }
            }

            if (replaceEntry)
            {
                var fullName = entry.FullName;
                entry.Delete();
                var newEntry = za.CreateEntry(fullName);
                using (StreamWriter writer = new StreamWriter(newEntry.Open()))
                using (XmlWriter xmlWriter = XmlWriter.Create(writer))
                {
                    entryXDoc.WriteTo(xmlWriter);
                }
            }
        }
    }
}

See #1295 for the package abstraction being proposed

For anyone interested, I’ve opened a PR to explore an API that performs functionality similar to what @abelykh0 discusses and linked to in stackoverflow

I believe we can solve this in a better way similar to https://github.com/OfficeDev/Open-XML-SDK/issues/807#issuecomment-1377733580:

(1) Override IPackageFeature to identify malformed relationships (2) replace them with an identifier that we can track (3) Override IPackage/IPackagePart APIs to return the original bad URI (4) Override Save to rewrite the replaced identifier with the original (if nothing changed)

** If the package is read only, we can copy it behind the scenes and operate on a copy of it so that we can manage these illegal uris

** This will be easy for File/Stream overloads… not so easy for Package

@twsouthwick Thank you and I write the CompatiblePackage.cs which can ignore some error. And I publish our application with the CompatiblePackage.cs three months ago, and I find that more than half of the problems were fixed.

See https://github.com/dotnet-campus/DocumentFormat.OpenXml.Extensions/blob/96313c16095be677ea36650a9dd0bcbcd57cf168/src/DocumentFormat.OpenXml.Flatten/DocumentFormat.OpenXml.Flatten/Compatibilities/Packaging/CompatiblePackage.cs

I’m going to reopen this to see if we can fix it better for a v3.0.0. I’m thinking we will go with @lindexi suggestion of creating an abstraction of package we can control more. I believe we can then handle things a bit better.

Sorry, please delete this file, it cannot be posted on a public place.

Considering Excel opens such files with no warnings and even “Open XML SDK 2.5 Productivity Tool” reports no warnings, I would consider this issue critical.