adm-zip: filenames with Unicode characters are corrupt

Similar to another issue, filenames with Unicode characters are unusable: 7-zip can neither read nor extract them.

For example:

Tal/A L'infini/Le Passé.txt -> Tal/A L'infini/Le Pass├⌐.tx

Snøfall.txt -> Sn├╕fall.tx

About this issue

Most upvoted comments

I just tested this code, little modified example code and for my surprise

  1. if Bit 11 is set but mode is not written - GUI tools understand filenames, but command line tools doesn’t. .
  2. if Bit 11 is not set and mode is Unix - command line tools understand file names, but GUI tools doesn’t.
  3. if Bit 11 is set and mode is Unix - Both GUI and command line command line tools understand file names .
#!/usr/bin/env node

const AdmZip = require('adm-zip');

const zip = new AdmZip();

// add file directly
const content = "inner content of the file";
//zip.addFile("äää.txt", Buffer.from(content), "entry comment goes here");
zip.addFile("你好.txt", Buffer.from(content), "entry comment goes here");

zip.getEntries().forEach(entry => {
    entry.header.made = 0x314;
    entry.header.flags |= 0x800;   // Set bit 11 - APP Note 4.4.4 Language encoding flag (EFS)
});

const willSendthis = zip.toBuffer();

zip.writeZip('./test-utf8.zip');

for GUI tools I used only Gnome Archive Manager. I also used google drive for testing but it detected correct names with every try.

I have same problem. Have you solved this?

This is not directly related to the OP, but along the same lines and is more of a heads-up. In a work project I am on, one of our developers used a method from the editors auto-complete call addLocalFolderPromise. I was not able to find that method documented so it may be intended to be a private method or was depricated at some point, not sure. As we were using the non-promisified version, the developer thought it would be an improvement to use the promisified version. This is fine if you are using files with non-unicode characters in the filename. If a unicode character is present, like ® in the filename, the folder will get created and the zipped files that contain unicode characters are getting saved with the names having the characters completely removed, breaking the app that was expecting filenames that were linked to be present in the zip directory. So a file named poster®.png will be zipped as poster.png. That comes from the adm-zip.js file lines 357-360

p = p
  .normalize("NFD")
  .replace(/[\u0300-\u036f]/g, "")
  .replace(/[^\x20-\x7E]/g, ""); // accent fix

If this should be a bug or separate issue filing, let me know and I’ll be happy to provide further details and open the correct issue/bug with a minimal replication.

I see there’s movement on this, fingers crossed you guys can fix it. Thank you for your efforts so far!

Good to see you guys are talking about Bit 11 as I was thinking along the same lines (see the comment under this article): https://lwn.net/Articles/729835/#:~:text=There are no specs for,the box%2C but for ZIP.

Nice hack, btw, but I am not sure it generates correct ZIP files.

Because adm-zip expects file names to be encoded as utf8 and writes file names encoded as utf8. but forgets to set that bit in flag. So when other apps read zip files created by Adm-zip. They dont understand file names are encoded in utf8 and get garbled results. It is usually not problem when you stick with US alphabetic names.

Maybe you try set this bit in flag instead.

...
zip.getEntries().forEach(entry => {
    (entry.header as any).flags |= 0x800;   // Set bit 11 - APP Note 4.4.4 Language encoding flag (EFS)
});
...

It should work.

Setting the entry.header.made to 788 before writing the zip worked for me. With this the Created OS is set to 03 'Unix' for the depending CENTRAL HEADER.

This workaround might help:

const zip = new AdmZip();
....
zip.addLocalFolder(...);
...
zip.getEntries().forEach(entry => {
   (entry.header as any).made = 788;
});
zip.writeZip(destinationPath);