vision: Downloading MNIST dataset with torchvision gives HTTP Error 403

🐛 Bug

I’m getting a 403 error when I try to download MNIST dataset with torchvision 0.4.2.

To Reproduce

../.local/lib/python3.6/site-packages/torchvision/datasets/mnist.py:68: in __init__
    self.download()
../.local/lib/python3.6/site-packages/torchvision/datasets/mnist.py:135: in download
    download_and_extract_archive(url, download_root=self.raw_folder, filename=filename)
../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:248: in download_and_extract_archive
    download_url(url, download_root, filename, md5)
../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:96: in download_url
    raise e
../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:84: in download_url
    reporthook=gen_bar_updater()
/usr/local/lib/python3.6/urllib/request.py:248: in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
/usr/local/lib/python3.6/urllib/request.py:223: in urlopen
    return opener.open(url, data, timeout)
/usr/local/lib/python3.6/urllib/request.py:532: in open
    response = meth(req, response)
/usr/local/lib/python3.6/urllib/request.py:642: in http_response
    'http', request, response, code, msg, hdrs)
/usr/local/lib/python3.6/urllib/request.py:570: in error
    return self._call_chain(*args)
/usr/local/lib/python3.6/urllib/request.py:504: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <urllib.request.HTTPDefaultErrorHandler object at 0x7efbf9edaac8>
req = <urllib.request.Request object at 0x7efbf9eda8d0>
fp = <http.client.HTTPResponse object at 0x7efbf9edaf98>, code = 403
msg = 'Forbidden', hdrs = <http.client.HTTPMessage object at 0x7efbf9ea22b0>

    def http_error_default(self, req, fp, code, msg, hdrs):
>       raise HTTPError(req.full_url, code, msg, hdrs, fp)
E       urllib.error.HTTPError: HTTP Error 403: Forbidden

Environment

torch==1.3.1
torchvision==0.4.2

Additional context

https://app.circleci.com/jobs/github/PyTorchLightning/pytorch-lightning/6877

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 47 (8 by maintainers)

Commits related to this issue

fix for #1938 — committed to ptrblck/vision by ptrblck 4 years ago
Patch pytorch_mnist.py until torchvision issue gets resolved See https://github.com/pytorch/vision/issues/1938 Signed-off-by: Nicolas V Castet <nvcastet@us.ibm.com> — committed to nvcastet/horovod by nvcastet 4 years ago
Notebook: Add (hack) HTTP headers to download MNIST Source: https://github.com/pytorch/vision/issues/1938#issuecomment-594623431. — committed to rodrigobdz/neural-processes by rodrigobdz 3 years ago
Update source code (#14) * Python: Add plot.py and mnist.py * Python: Delete duplicated functions from utils.py Functions were refactored to plot.py and mnist.py. * Notebook: Merge contents ... — committed to rodrigobdz/neural-processes by rodrigobdz 3 years ago
Fix 403 error when downloading Mnist dataset in Pytorch Lighting example - adds a work around for the issue described in https://github.com/pytorch/vision/issues/1938 Signed-off-by: Janusz Lisiecki ... — committed to JanuszL/DALI by JanuszL 3 years ago
issues/1938 solved # https://github.com/pytorch/vision/issues/1938#issuecomment-789986996 — committed to jorditorresBCN/PyTorch-vs-TensorFlow by jorditorresBCN 3 years ago
Fix 403 error when downloading Mnist dataset in Pytorch Lighting example (#2759) - adds a work around for the issue described in https://github.com/pytorch/vision/issues/1938 Signed-off-by: Janusz... — committed to NVIDIA/DALI by JanuszL 3 years ago
Update requirements.txt for examples Summary: Update requirements.txt to address ongoing issues with our integrations tests. There are two: a. `ModuleNotFoundError: No module named 'requests'` ([Cir... — committed to ffuuugor/opacus by deleted user 3 years ago
Update requirements.txt for examples (#148) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/148 Update requirements.txt to address ongoing issues with our integrations tests. ... — committed to pytorch/opacus by deleted user 3 years ago

Most upvoted comments

@eduardo4jesus You could patch your model script at the top using:

from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

It will use that user agent for the entire script assuming the opener does not get overwritten somewhere else.

+82

nvcastet on Mar 4, 2020

copy this snippet at the top of your notebook, run it, and then just load your datasets as usual…

from six.moves import urllib opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] urllib.request.install_opener(opener)

+51

andresgtn on Mar 3, 2021

+14

madhavajay on Mar 5, 2021

I’ve just got the same problem. Waiting for the answer without changing codes… (ROOKIE ALERT)

Clone this to your working dir: https://github.com/knamdar/data

+12

knamdar on Mar 5, 2020

Did anyone got urllib.error.HTTPError: HTTP Error 503: Service Unavailable last night?

+10

junpuf on Mar 11, 2021

FYI we changed torchvision to try to fix this, and the fix should be present in the latest release (from yesterday), with a fix in https://github.com/pytorch/vision/pull/3499

fmassa on Mar 5, 2021

Same on Colab:

import torchvision
import torchvision.transforms as transforms
root_dir = './data/MNIST/'
torchvision.datasets.MNIST(root=root_dir,download=True)

I get: HTTPError: HTTP Error 403: Forbidden

torchvision.__version__ -> 0.8.2+cu101

mlelarge on Mar 3, 2021

@BernardoOlisan @ChengguiSun the solution from @mlelarge works great, the following worked for me in a notebook

!wget www.di.ens.fr/~lelarge/MNIST.tar.gz
!tar -zxvf MNIST.tar.gz

from torchvision.datasets import MNIST
from torchvision import transforms

mnist_train = MNIST('./', download=False,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                    ]), train=True)

alisterburt on Mar 12, 2021

@eduardo4jesus You can explicitly add headers as stated above, something alike:

opener = urllib.request.URLopener()
opener.addheader('User-Agent', some_user_agent)
opener.retrieve(
    url, fpath,
    reporthook=gen_bar_updater()
)

(line 81 and onwards in vision/torchvision/datasets/utils.py). Seems to be a quick workaround that works.

mvelebit on Mar 4, 2020

Thank you ! @mlelarge and @alisterburt

HugoSchmutz on Mar 12, 2021

You can download it from my webpage: https://gist.github.com/mlelarge/60ddefa9e16bc06f7f4fc7bff769bdb1

mlelarge on Mar 11, 2021

still happens

ghost on Mar 3, 2021

This should have been fixed now, there is no need to update torchvision.

All should be working as before, without any change on the user side.

This was fixed on the server hosting the original dataset (thanks @soumith !).

As such, I’m closing this issue but let us know if you still face this issue.

fmassa on Mar 5, 2020

This should be fixed (again) in the next torchvision nightly, and the fix will be present in the next minor release of torchvision, which should be out soon.

See https://github.com/pytorch/vision/pull/3544 for more details

fmassa on Mar 12, 2021

I think perhaps we should start a new issue (that links back to this one) as this one is closed and may get no attention. So far as I can see no one has made a new one yet, so I will try to create one now, referring back here.

kesterlester on Mar 3, 2021

Is there any way to have a quick fix without using the master? I am concerned about the potential changes I have to do in my code for going from the version I am using (1.4.0) and the master.

eduardo4jesus on Mar 4, 2020

this is because the download links for mnist at https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py#L33-L36 are hosted on yann.lecun.com and that server has moved under CloudFlare protection.

@fmassa we need to maybe mirror and change the URLs to maybe the PyTorch S3 bucket or something

soumith on Mar 4, 2020

copy this snippet at the top of your notebook, run it, and then just load your datasets as usual…

from six.moves import urllib opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] urllib.request.install_opener(opener)

Dude this is insane you save my day, thank you a lot bro!!

BernardoOlisan on Mar 6, 2021

Thanks @kesterlester I did put MNIST on my webpage: wget www.di.ens.fr/~lelarge/MNIST.tar.gz if you need it!

mlelarge on Mar 3, 2021

so could we make a hot-fix somehow?

Borda on Mar 4, 2020

Thanks for reporting! I can reproduce the issue locally, and downloading from the browser works.

I don’t yet know what the root cause is though.

fmassa on Mar 4, 2020