node-downloader-helper: Missing extension when trying to download webpages

I just tried ‘downloading’ https://google.com using the following command:

new DownloaderHelper('https://google.com', '/home/downloads', {
      fileName: {name: 'test', ext: false},
      ...
    });

This will download the page, but the filename will be test., without the html part. On windows, this will cause a corrupt file which can’t be renamed or deleted.

I’m guessing this is because of ext: false and the fact that there normally is no extension when loading a webpage.
However, this should not happen, at least not with the trailing dot.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (16 by maintainers)

Most upvoted comments

I am currently reading the extension data of the file after downloading to rename it with the correct extension.

Hey @Wetbikeboy2500 would you mind explaining how you do this over in #27? 😃

_Originally posted by @Chaphasilor in https://github.com/hgouveia/node-downloader-helper/issues/10#issuecomment-774210762_

It has been a bit, but I can explain it. I really did not do anything groundbreaking. The first step is to isolate the file name. I then checked to see if there was a dot within its name and them did a substring from the end of the filename to the dot.

This issue has a few more things to consider though. When you request a website like google.com by https://google.com the browser will actually give you https://google.com/index.html. You can see this by doing an inspect source on the page. This heavily depends on how the server is set up and most pages seem to just return html without giving a name or extension, but this is okay since it defaults its type to html since it is a url. Other links embedded in a page also have corresponding tags and attributes so the browser knows what type of content it is and should try to assume it as.

What I am getting at is that context matters, and you should know the general data type it will be before you even have a url. I was working with images from img tags so I could assume .jpg or .png from the website’s standard.

The ext: false issues come from https://github.com/hgouveia/node-downloader-helper/blob/1f2ac3b9441a79c097e55ee73274976f6e251629/src/index.js#L692-#L693

When ext is false, it will assume there is a dot for the extension and then join the name with whatever extension could exist. This could cause there to be a trailing dot with no extension. It also has the other case of popping the actual filename from the end of the list and putting that as the extension.

When it comes to external resources like the html on a website, I think it is more up to the developer to assume extension and data types than the downloading tool they are using. The assets on those pages like SVGs, Images, etc. can be assumed by the developer and can then use mime-types to narrow down data types of those specific assets after download.

@Chaphasilor so basically, i have a function that gets the filename based on this HTTP header content-disposition, but if not available, it will get it from the URL, so because i am using path.basename probably is ignoring the extension, I would need to check this, https://github.com/hgouveia/node-downloader-helper/blob/master/src/index.js#L595

in theory, is an easy fix