runtime: ZipFile.ExtractToDirectory failing with "The filename, directory name, or volume label syntax is incorrect." (works in SharpZipLib)

Description

I have some code using System.IO.Compression:

ZipFile.ExtractToDirectory(archive, Path.Combine(directory.FullName, "groove-v1.0.0-midionly"))

Reproduction Steps

putting this in a .fsx file and running it with dotnet fsi:

#r "nuget: System.IO.Compression"
open System
open System.IO
open System.IO.Compression
open System.Net.Http
let directory = DirectoryInfo __SOURCE_DIRECTORY__
let archive = Path.Combine(__SOURCE_DIRECTORY__, "groove-v1.0.0-midionly.zip")
do
    let uri = "https://storage.googleapis.com/magentadata/datasets/groove/groove-v1.0.0-midionly.zip"
    use client = new HttpClient(Timeout = TimeSpan.FromMinutes 30)
    use resp = client.Send(new HttpRequestMessage(RequestUri=Uri uri))
    use stream = resp.Content.ReadAsStream()
    use file = File.OpenWrite(archive)
    stream.CopyTo(file)
//let fz = ICSharpCode.SharpZipLib.Zip.FastZip()
//fz.ExtractZip(archive, Path.Combine(directory.FullName, filesetName), "")
// dotnet one fails one the groove dataset.
ZipFile.ExtractToDirectory(archive, Path.Combine(directory.FullName, "groove-v1.0.0-midionly"))

I tried on macos, and it doesn’t seem to fail, so maybe it is platform dependent.

Expected behavior

I wish it would work, extracting without an exception.

Actual behavior

System.IO.IOException: The filename, directory name, or volume label syntax is incorrect. : 'C:\dev\src\github.com\smoothdeveloper\zcore-midi-fs\demo\groove-v1.0.0-midionly\groove\drummer8\session2\Icon ’ at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options) at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize) at System.IO.Strategies.OSFileStreamStrategy…ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize) at System.IO.Compression.ZipFileExtensions.ExtractToFile(ZipArchiveEntry source, String destinationFileName, Boolean overwrite) at System.IO.Compression.ZipFileExtensions.ExtractToDirectory(ZipArchive source, String destinationDirectoryName, Boolean overwriteFiles) at System.IO.Compression.ZipFile.ExtractToDirectory(String sourceArchiveFileName, String destinationDirectoryName, Encoding entryNameEncoding, Boolean overwriteFiles) at <StartupCode$FSI_0003>.$FSI_0003.main@() in

Regression?

No response

Known Workarounds

I am using FastZip class from ICSharpCode.SharpZipLib.Zip to work around the issue.

Configuration

Welcome to F# Interactive for .NET Core in Visual Studio. To execute code, either
  1. Use 'Send to Interactive' (Alt-Enter or right-click) from an F# script. The F# Interactive process will
     use any global.json settings associated with that script.
  2. Press 'Enter' to start. The F# Interactive process will use default settings.
> 

Microsoft (R) F# Interactive version 12.0.2.0 for F# 6.0
Copyright (c) Microsoft Corporation. All Rights Reserved.
  • Windows 11 x64
  • VS2022 preview
  • dotnet 7 preview 2

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 83 (81 by maintainers)

Commits related to this issue

Most upvoted comments

@danmoseley I would like to help, but this is my first time. Could you point me in the right direction to start?

Newlines in paths are valid in POSIX (although wierd). Looking at SharpZipLib, it does some cleaning of paths before writing them in Windows

https://github.com/icsharpcode/SharpZipLib/blob/cc8dd78ed989888f6685da4cc009c529158738b4/src/ICSharpCode.SharpZipLib/Zip/WindowsNameTransform.cs#L29-L37 https://github.com/icsharpcode/SharpZipLib/blob/cc8dd78ed989888f6685da4cc009c529158738b4/src/ICSharpCode.SharpZipLib/Zip/WindowsNameTransform.cs#L210

In our code see essentially no sanitization. I notice that this sanitization idea was discussed here https://github.com/dotnet/runtime/issues/15938#issuecomment-169192513 and rejected, but nobody felt strongly. The idea was you could instead enumerate the zip entries manually and write them out with your own names. I think it is more useful for ExtractToDirectory to sanitize with underscore replacements, as 7zip and SharpZipLib behavior suggests. If someone didn’t like the sanitization scheme, they could do it manually rather than using ExtractToDirectory.

The work required is

  1. add a test zip file into runtime-assets that contains files with these various characters that should be sanitized. You might need to use Linux, such as WSL, to be able to create this zip - not sure. (As a test, try extracting it with sharpziplib or possibly 7zip and verify that it extracts correctly, using underscores in place of each special character)
  2. after that PR is merged, wait a few hours for a PR to appear like this one that updates this repo to consume it.
  3. add tests similar to this one, but that succeed rather than throw. verify they fail.
  4. make tests pass by sanitizing probably here https://github.com/dotnet/runtime/blob/09a38d4807b0aa5e3ad3e69a4bf5bb50c710cd76/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs#L1090 using the same rules as SharpZipLib. you might need to fix some pre existing tests that were previously expecting this to fail.
  5. put up the changes as a PR into this repo.