galaxy: ansible-galaxy collection install timeout

Bug Report

SUMMARY

We’ve seen ERROR! Unexpected Exception, this is probably a bug: ('The read operation timed out',) (10 minute time out) quite a few times. Size of the collection doesn’t seem to be related.

Is there any logging on Galaxy to see how common this is?

ansible-galaxy -vvv collection install fortinet.fortios
01:49 Downloading https://galaxy.ansible.com/download/fortinet-fortios-1.0.7.tar.gz to /root/.ansible/tmp/ansible-local-666KgfAMW/tmpXSNpnv
# Note 10 minutes have passed
01:59 ERROR! Unexpected Exception, this is probably a bug: ('The read operation timed out',)

STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS

https://app.shippable.com/github/ansible-collections/community.general/runs/164/3/console

01:40 + ansible-galaxy -vvv collection install fortinet.fortios
01:43 [WARNING]: You are running the development version of Ansible. You should only
01:43 run Ansible from "devel" if you are modifying the Ansible engine, or trying out
01:43 features under development. This is a rapidly changing source of code and can
01:43 become unstable at any point.
01:43 [DEPRECATION WARNING]: Setting verbosity before the arg sub command is 
01:43 deprecated, set the verbosity after the sub command. This feature will be 
01:43 removed in version 2.13. Deprecation warnings can be disabled by setting 
01:43 deprecation_warnings=False in ansible.cfg.
01:43 ansible-galaxy 2.10.0.dev0
01:43   config file = None
01:43   configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
01:43   ansible python module location = /root/venv/lib/python2.7/site-packages/ansible
01:43   executable location = /root/venv/bin/ansible-galaxy
01:43   python version = 2.7.15+ (default, Feb  9 2019, 11:33:22) [GCC 5.4.0 20160609]
01:43 No config file found; using defaults
01:43 Found installed collection ansible.posix:0.1.1 at '/root/.ansible/ansible_collections/ansible/posix'
01:43 Found installed collection ansible.netcommon:0.0.2 at '/root/.ansible/ansible_collections/ansible/netcommon'
01:43 Found installed collection community.crypto:0.1.0 at '/root/.ansible/ansible_collections/community/crypto'
01:43 Found installed collection community.kubernetes:0.10.0 at '/root/.ansible/ansible_collections/community/kubernetes'
01:43 [WARNING]: Collection at '/root/.ansible/ansible_collections/community/general'
01:43 does not have a MANIFEST.json file, cannot detect version.
01:43 Found installed collection community.general:* at '/root/.ansible/ansible_collections/community/general'
01:43 Found installed collection f5networks.f5_modules:1.2.1 at '/root/.ansible/ansible_collections/f5networks/f5_modules'
01:43 Found installed collection cisco.intersight:1.0.3 at '/root/.ansible/ansible_collections/cisco/intersight'
01:43 Found installed collection cisco.mso:0.0.4 at '/root/.ansible/ansible_collections/cisco/mso'
01:43 Found installed collection check_point.mgmt:1.0.4 at '/root/.ansible/ansible_collections/check_point/mgmt'
01:43 Found installed collection ovirt.ovirt_collection:1.0.1 at '/root/.ansible/ansible_collections/ovirt/ovirt_collection'
01:43 Process install dependency map
01:43 Processing requirement collection 'fortinet.fortios'
01:43 Opened /root/.ansible/galaxy_token
01:45 Collection 'fortinet.fortios' obtained from server default https://galaxy.ansible.com/api/
01:49 Starting collection install process
01:49 Installing 'fortinet.fortios:1.0.7' to '/root/.ansible/ansible_collections/fortinet/fortios'
01:49 Downloading https://galaxy.ansible.com/download/fortinet-fortios-1.0.7.tar.gz to /root/.ansible/tmp/ansible-local-666KgfAMW/tmpXSNpnv
01:59 ERROR! Unexpected Exception, this is probably a bug: ('The read operation timed out',)
01:59 the full traceback was:
01:59 
01:59 Traceback (most recent call last):
01:59   File "/root/venv/bin/ansible-galaxy", line 123, in <module>
01:59     exit_code = cli.run()
01:59   File "/root/venv/lib/python2.7/site-packages/ansible/cli/galaxy.py", line 479, in run
01:59     context.CLIARGS['func']()
01:59   File "/root/venv/lib/python2.7/site-packages/ansible/cli/galaxy.py", line 990, in execute_install
01:59     no_deps, force, force_deps, context.CLIARGS['allow_pre_release'])
01:59   File "/root/venv/lib/python2.7/site-packages/ansible/galaxy/collection.py", line 601, in install_collections
01:59     collection.install(output_path, b_temp_path)
01:59   File "/root/venv/lib/python2.7/site-packages/ansible/galaxy/collection.py", line 203, in install
01:59     self.b_path = self.download(b_temp_path)
01:59   File "/root/venv/lib/python2.7/site-packages/ansible/galaxy/collection.py", line 188, in download
01:59     headers=headers)
01:59   File "/root/venv/lib/python2.7/site-packages/ansible/galaxy/collection.py", line 1105, in _download_file
01:59     unredirected_headers=['Authorization'], http_agent=user_agent())
01:59   File "/root/venv/lib/python2.7/site-packages/ansible/module_utils/urls.py", line 1383, in open_url
01:59     unredirected_headers=unredirected_headers)
01:59   File "/root/venv/lib/python2.7/site-packages/ansible/module_utils/urls.py", line 1288, in open
01:59     return urllib_request.urlopen(request, None, timeout)
01:59   File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
01:59     return opener.open(url, data, timeout)
01:59   File "/usr/lib/python2.7/urllib2.py", line 429, in open
01:59     response = self._open(req, data)
01:59   File "/usr/lib/python2.7/urllib2.py", line 447, in _open
01:59     '_open', req)
01:59   File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
01:59     result = func(*args)
01:59   File "/root/venv/lib/python2.7/site-packages/ansible/module_utils/urls.py", line 448, in https_open
01:59     req
01:59   File "/usr/lib/python2.7/urllib2.py", line 1201, in do_open
01:59     r = h.getresponse(buffering=True)
01:59   File "/usr/lib/python2.7/httplib.py", line 1121, in getresponse
01:59     response.begin()
01:59   File "/usr/lib/python2.7/httplib.py", line 438, in begin
01:59     version, status, reason = self._read_status()
01:59   File "/usr/lib/python2.7/httplib.py", line 394, in _read_status
01:59     line = self.fp.readline(_MAXLINE + 1)
01:59   File "/usr/lib/python2.7/socket.py", line 480, in readline
01:59     data = self._sock.recv(self._rbufsize)
01:59   File "/usr/lib/python2.7/ssl.py", line 772, in recv
01:59     return self.read(buflen)
01:59   File "/usr/lib/python2.7/ssl.py", line 659, in read
01:59     v = self._sslobj.read(len)
01:59 SSLError: ('The read operation timed out',)

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 28
  • Comments: 67 (3 by maintainers)

Commits related to this issue

Most upvoted comments

Right now (and earlier today), timeouts seem to happen a lot more.

Any updates on this issue? Is this problem very much common to any specific version of ansible?

I did workaround like someone already noted here by bypassing galaxy API directly to GIT. From that time i had no problem with timeouts… ( after they confirm problem fix ill revert it back to API )

# galaxy collection
collections:
  - name: https://opendev.org/openstack/ansible-collections-openstack
    type: git
  - name: https://github.com/ansible-collections/community.general
    type: git
  - name: https://github.com/ansible-collections/community.mysql
    type: git
  - name: https://github.com/ansible-collections/community.proxysql
    type: git

Hitting this more frequently in the last week or so as well.

Services behind galaxy.ansible.com were restarted about an hour ago. Also some worker restart thresholds have been increased.

This was the workaround I used to bypass this

    - name: Install ansible galaxy collections
      ansible.builtin.command:
      args:
        cmd: ansible-galaxy collection install "{{ item }}"
        creates: $HOME/.ansible/collections/ansible_collections/community/{docker,general,hashi_vault,mongodb,mysql}
      loop: 
        - community.mysql
        - community.general
        - community.hashi_vault
        - community.docker
        - ansible.posix
        - community.mongodb
      register: install_ansible_collections
      retries: 10
      until: install_ansible_collections.rc == 0

I see this a lot in openstack-ansible CI and it’s likely the most frequent cause of change-unrelated job failures currently.

Update from the Ansible side, it appears that someone is scraping galaxy.ansible.com on the hour (every hour) which is causing an increased load and other requests to time out. We are adding some logging in API service to log that from HTTP headers to help identify.

Looks like --timeout 60 fixed it for me. Anyway I think that a timeout option should not be required in order to make the command work properly, but it seems like this is a server-side problem and that it can’t be fixed (properly, not in a hacky way appending timeout option) in client-side.

A customer also reported this issue and I proposed the modification above to increase the timeout, they could resolve their problem.

Do we raise an RFE to ansible/ansible? If customer can configure a timeout values in ansible.cfg or something like that, it may be helpful.

I’m seeing this quite a bit in github actions: https://github.com/cognifloyd/community.mongodb/runs/1130183174?check_suite_focus=true

I’ve added some retry logic, but that only partially works. It looks like ansible-galaxy has a hard-coded 20 second timeout.

https://github.com/ansible/ansible/blob/fa1fb2d13bdf948dc319be57e8465a9ef48c7fe3/lib/ansible/galaxy/api.py#L195-L197

I’ll go mention it in #ansible-galaxy

This appears to still be an issue. Happening for us weekly on various collections

“ERROR! Unknown error when attempting to call Galaxy at ‘https://galaxy.ansible.com/api/v3/collections/vyos/vyos/versions/4.0.2/’: The read operation timed out”

@mickaelvieira 2 minutes seems a bit high for one request and unlikely to succeed. Ansible should really get their galaxy servers in check, or at least try to fix this in code.

For anyone who are having this issue, increasing the timeout might help

ansible-galaxy collection install --timeout 120 --verbose -r requirements.yml

Is there a canonical solution for this bug yet? Seeing:

  • Hard-patch the galaxy module timeout
  • Custom module python3 -m pip install https://github.com/WATonomous/ansible/archive/galaxy_timeout.tar.gz
  • Adding retry loops to container specs
  • Bypassing Galaxy to Git

🤷

I’d say so, since it was an API issue. Imho this is resolved for now, I hadn’t had issues end of last week but I have’t read any official announcement. Ansible only confirmed the problem but no update since then.

An infinite retry loop with no back-off/delay or limit will presumably only make the situation worse.

Well, yes and no. Your 1 sec just adds to the timeout. It’s not like it’s permanently hammering the API. In fact I used to have a sleep 1 there first but dropped it.

At his point any attempt is making the situation worst. That’s why I completely dropped galaxy for now and install via git.

If anyone needs an example, this is my quick and dirty solution:

RUN mkdir -p /usr/share/ansible/ansible_collections/community \
             /usr/share/ansible/ansible_collections/ansible \
             /usr/share/ansible/ansible_collections/amazon && \
    cd /usr/share/ansible/ansible_collections/community && \

    git clone https://github.com/ansible-collections/community.molecule.git molecule && \

    git clone https://github.com/ansible-collections/community.windows.git windows && \
    cd windows && git checkout -q v1.3.0 && cd .. && \

    git clone https://github.com/ansible-collections/community.aws.git aws && \
    cd aws && git checkout -q 1.5.0 && cd .. && \

    cd /usr/share/ansible/ansible_collections/ansible && \

    git clone https://github.com/ansible-collections/ansible.windows.git windows && \
    cd windows && git checkout -q 1.9.0 && cd .. && \

    cd /usr/share/ansible/ansible_collections/amazon && \

    git clone https://github.com/ansible-collections/amazon.aws.git aws && \
    cd aws && git checkout -q 3.1.1

I’m running into this right now. I am getting a CloudFlare branded 504 which means the origin server (Galaxy) gave a gateway timeout.

I’ve been trying to install community.general since yesterday evening. I managed to install part of the dependencies straight away, but then:

$ ansible-galaxy collection install -r requirements.yml
Process install dependency map
Starting collection install process
Skipping 'ansible.netcommon' as it is already installed
Skipping 'google.cloud' as it is already installed
Installing 'community.general:1.3.3' to '/home/me/.ansible/collections/ansible_collections/community/general'
ERROR! Unexpected Exception, this is probably a bug: ('The read operation timed out',)

By adding -vvv and wgeting the actual package url, it looks like there is a redirect to S3, which answers after some delay.

What worked for me was to change the default 10 seconds delay to 30 seconds in open_url here: https://github.com/ansible/ansible/blob/7f0eb7ad799e531a8fbe5cc4f46046a4b1aeb093/lib/ansible/module_utils/urls.py#L1524.

Isn’t 10 seconds a little too optimistic?

Luckily all these dependencies will be gone for community.general 2.0.0 😃

Is there any status page or api ?

And any workaround ? maybe a sed to change hardcoded value