vision: Downloading MNIST dataset with torchvision gives HTTP Error 403

šŸ› Bug

I’m getting a 403 error when I try to download MNIST dataset with torchvision 0.4.2.

To Reproduce

../.local/lib/python3.6/site-packages/torchvision/datasets/mnist.py:68: in __init__
    self.download()
../.local/lib/python3.6/site-packages/torchvision/datasets/mnist.py:135: in download
    download_and_extract_archive(url, download_root=self.raw_folder, filename=filename)
../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:248: in download_and_extract_archive
    download_url(url, download_root, filename, md5)
../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:96: in download_url
    raise e
../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:84: in download_url
    reporthook=gen_bar_updater()
/usr/local/lib/python3.6/urllib/request.py:248: in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
/usr/local/lib/python3.6/urllib/request.py:223: in urlopen
    return opener.open(url, data, timeout)
/usr/local/lib/python3.6/urllib/request.py:532: in open
    response = meth(req, response)
/usr/local/lib/python3.6/urllib/request.py:642: in http_response
    'http', request, response, code, msg, hdrs)
/usr/local/lib/python3.6/urllib/request.py:570: in error
    return self._call_chain(*args)
/usr/local/lib/python3.6/urllib/request.py:504: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <urllib.request.HTTPDefaultErrorHandler object at 0x7efbf9edaac8>
req = <urllib.request.Request object at 0x7efbf9eda8d0>
fp = <http.client.HTTPResponse object at 0x7efbf9edaf98>, code = 403
msg = 'Forbidden', hdrs = <http.client.HTTPMessage object at 0x7efbf9ea22b0>

    def http_error_default(self, req, fp, code, msg, hdrs):
>       raise HTTPError(req.full_url, code, msg, hdrs, fp)
E       urllib.error.HTTPError: HTTP Error 403: Forbidden

Environment

  • torch==1.3.1
  • torchvision==0.4.2

Additional context

https://app.circleci.com/jobs/github/PyTorchLightning/pytorch-lightning/6877

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 47 (8 by maintainers)

Commits related to this issue

Most upvoted comments

@eduardo4jesus You could patch your model script at the top using:

from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

It will use that user agent for the entire script assuming the opener does not get overwritten somewhere else.

copy this snippet at the top of your notebook, run it, and then just load your datasets as usual…

from six.moves import urllib opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] urllib.request.install_opener(opener)

I’ve just got the same problem. Waiting for the answer without changing codes… (ROOKIE ALERT)

Clone this to your working dir: https://github.com/knamdar/data

Did anyone got urllib.error.HTTPError: HTTP Error 503: Service Unavailable last night?

FYI we changed torchvision to try to fix this, and the fix should be present in the latest release (from yesterday), with a fix in https://github.com/pytorch/vision/pull/3499

Same on Colab:

import torchvision
import torchvision.transforms as transforms
root_dir = './data/MNIST/'
torchvision.datasets.MNIST(root=root_dir,download=True)

I get: HTTPError: HTTP Error 403: Forbidden

torchvision.__version__ -> 0.8.2+cu101

@BernardoOlisan @ChengguiSun the solution from @mlelarge works great, the following worked for me in a notebook

!wget www.di.ens.fr/~lelarge/MNIST.tar.gz
!tar -zxvf MNIST.tar.gz

from torchvision.datasets import MNIST
from torchvision import transforms

mnist_train = MNIST('./', download=False,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                    ]), train=True)

@eduardo4jesus You can explicitly add headers as stated above, something alike:

opener = urllib.request.URLopener()
opener.addheader('User-Agent', some_user_agent)
opener.retrieve(
    url, fpath,
    reporthook=gen_bar_updater()
)

(line 81 and onwards in vision/torchvision/datasets/utils.py). Seems to be a quick workaround that works.

still happens

This should have been fixed now, there is no need to update torchvision.

All should be working as before, without any change on the user side.

This was fixed on the server hosting the original dataset (thanks @soumith !).

As such, I’m closing this issue but let us know if you still face this issue.

This should be fixed (again) in the next torchvision nightly, and the fix will be present in the next minor release of torchvision, which should be out soon.

See https://github.com/pytorch/vision/pull/3544 for more details

I think perhaps we should start a new issue (that links back to this one) as this one is closed and may get no attention. So far as I can see no one has made a new one yet, so I will try to create one now, referring back here.

Is there any way to have a quick fix without using the master? I am concerned about the potential changes I have to do in my code for going from the version I am using (1.4.0) and the master.

this is because the download links for mnist at https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py#L33-L36 are hosted on yann.lecun.com and that server has moved under CloudFlare protection.

@fmassa we need to maybe mirror and change the URLs to maybe the PyTorch S3 bucket or something

copy this snippet at the top of your notebook, run it, and then just load your datasets as usual…

from six.moves import urllib opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] urllib.request.install_opener(opener)

Dude this is insane you save my day, thank you a lot bro!!

Thanks @kesterlester I did put MNIST on my webpage: wget www.di.ens.fr/~lelarge/MNIST.tar.gz if you need it!

so could we make a hot-fix somehow?

Thanks for reporting! I can reproduce the issue locally, and downloading from the browser works.

I don’t yet know what the root cause is though.