spaCy: Timeout Downloading Models

How to reproduce the behaviour

My GitHub action trying to download models as follows -

python -m spacy download en_core_web_lg

But it is sometimes giving timeout errors -

ERROR: Could not install packages due to an OSError: HTTPSConnectionPool(host='objects.githubusercontent.com', port=443): Max retries exceeded with url: /github-production-release-asset-2e65be/84940268/ee782580-63d4-11eb-9a2f-4a14ddffedbb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211103%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211103T074829Z&X-Amz-Expires=300&X-Amz-Signature=4a4170665e395bcd6d5c55886d9fdc8d982870ee5954f34ef0d681b9ded628a2&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=84940268&response-content-disposition=attachment%3B%20filename%3Den_core_web_lg-3.0.0-py3-none-any.whl&response-content-type=application%2Foctet-stream (Caused by ReadTimeoutError("HTTPSConnectionPool(host='objects.githubusercontent.com', port=443): Read timed out. (read timeout=15)"))

Not sure if this is related to this https://github.com/explosion/spaCy/issues/5260

Your Environment

  • Operating System: Github Runners (Ubuntu, Windows, and Mac)
  • Python Version Used: 3.7
  • spaCy Version Used: 3.0.0
  • Environment Information: pip

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 7
  • Comments: 42 (16 by maintainers)

Most upvoted comments

For sure this is still an issue, for everyone catching up. The connection is timing out intermittently. --timeout 60 is the workaround I have included (works with pip and poetry). Looks to be an issue with github vs a project issue.

Please note we are aware that basically we need Github to fix this and there’s not much we can do. We have contacted them through support and are waiting on a solution. Sorry for the inconvenience until it’s fixed…

I ran into the exact same issue which is problematic since I build docker containers regularly. To resolve this, I ended up downloading the model tar.gz from https://spacy.io/models/en and then used twine to upload it to a private nexus-hosted pypi repo

pip3 install twine python -m twine upload -r nexus de_core_news_md-3.1.0.tar.gz

I replaced in the Dockerfile

RUN python3 -m spacy download en_core_web_lg

to instead use:

RUN pip3 install http://nexus.private.com:8081/repository/pypi-hosted/packages/en-core-web-lg/3.1.0/en_core_web_lg-3.1.0.tar.gz

and that creates a stable docker build for me

Github appears to have identified and resolved the issue that caused this, see here for details.

I’m going to go ahead and close this, but if anyone continues to have the issue please let us know.

@omri374 As Adriane mentioned above, HuggingFace doesn’t have older models.

Thanks for the hints! For our needed version de_core_news_lg 2.3.0, there seems to be no wheel file and also probably nothing on the HuggingFace hub. Also direct pip install of the tar.gz file did not work (got the connection error from the original post). But what worked was downloading the model as tar.gz and then installing it from the local location via pip install

Workaround instructions:

  1. Go to the spaCy org on HuggingFace Hub and find your model
  2. Click on the “use in spaCy” button in the top right
  3. The URL will be in the first line, with a pip install command

The URL will look like this:

https://huggingface.co/spacy/<MODEL_NAME>/resolve/main/<MODEL_NAME>-any-py3-none-any.whl

No worries! I was able to get the same error so it looks like something is up with Github’s servers. I’ll see if we can do something about it.

For me, it’s still not working with the same error as in the original post. We use the model de_core_news_lg and need to use an old version of the model (2.3.0) Environment:

  • Operating System: Mac (local) and Ubuntu (from Docker)
  • Python Version Used: 3.9
  • spaCy Version Used: 3.0.0
  • command: python -m spacy download de_core_news_lg