spaCy: Timeout Downloading Models
How to reproduce the behaviour
My GitHub action trying to download models as follows -
python -m spacy download en_core_web_lg
But it is sometimes giving timeout errors -
ERROR: Could not install packages due to an OSError: HTTPSConnectionPool(host='objects.githubusercontent.com', port=443): Max retries exceeded with url: /github-production-release-asset-2e65be/84940268/ee782580-63d4-11eb-9a2f-4a14ddffedbb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211103%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211103T074829Z&X-Amz-Expires=300&X-Amz-Signature=4a4170665e395bcd6d5c55886d9fdc8d982870ee5954f34ef0d681b9ded628a2&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=84940268&response-content-disposition=attachment%3B%20filename%3Den_core_web_lg-3.0.0-py3-none-any.whl&response-content-type=application%2Foctet-stream (Caused by ReadTimeoutError("HTTPSConnectionPool(host='objects.githubusercontent.com', port=443): Read timed out. (read timeout=15)"))
Not sure if this is related to this https://github.com/explosion/spaCy/issues/5260
Your Environment
- Operating System: Github Runners (Ubuntu, Windows, and Mac)
- Python Version Used: 3.7
- spaCy Version Used: 3.0.0
- Environment Information: pip
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 7
- Comments: 42 (16 by maintainers)
For sure this is still an issue, for everyone catching up. The connection is timing out intermittently.
--timeout 60is the workaround I have included (works with pip and poetry). Looks to be an issue with github vs a project issue.Please note we are aware that basically we need Github to fix this and there’s not much we can do. We have contacted them through support and are waiting on a solution. Sorry for the inconvenience until it’s fixed…
I ran into the exact same issue which is problematic since I build docker containers regularly. To resolve this, I ended up downloading the model tar.gz from https://spacy.io/models/en and then used twine to upload it to a private nexus-hosted pypi repo
pip3 install twine python -m twine upload -r nexus de_core_news_md-3.1.0.tar.gz
I replaced in the Dockerfile
RUN python3 -m spacy download en_core_web_lg
to instead use:
RUN pip3 install http://nexus.private.com:8081/repository/pypi-hosted/packages/en-core-web-lg/3.1.0/en_core_web_lg-3.1.0.tar.gz
and that creates a stable docker build for me
Github appears to have identified and resolved the issue that caused this, see here for details.
I’m going to go ahead and close this, but if anyone continues to have the issue please let us know.
@omri374 As Adriane mentioned above, HuggingFace doesn’t have older models.
Thanks for the hints! For our needed version de_core_news_lg 2.3.0, there seems to be no wheel file and also probably nothing on the HuggingFace hub. Also direct pip install of the tar.gz file did not work (got the connection error from the original post). But what worked was downloading the model as tar.gz and then installing it from the local location via pip install
Workaround instructions:
pip installcommandThe URL will look like this:
No worries! I was able to get the same error so it looks like something is up with Github’s servers. I’ll see if we can do something about it.
For me, it’s still not working with the same error as in the original post. We use the model de_core_news_lg and need to use an old version of the model (2.3.0) Environment: