salt: [BUG] - request timeouts when using salt 3005.x and 3004.x on RHEL8 and RHEL9
Description This is similar to the issue described in https://github.com/saltstack/salt/issues/62881
Our configuration management setup runs a state to rename the OS when salt-cloud finishes its cloning of a VM.
In 3004.2 and 3005.x we notice that the state.apply of the startup state in Linux and Windows minions (using minion 3005.x) ALWAYS ends in a message timeout. Consequent state.apply of a state also displays the same message timeout. Sometimes it may work, but thats very very rare and in most cases never.
Been debugging this for several months now in various combinations of the OS versions.
Have tried the 3005 version of RHEL9/RHEL8/Debian 11 as master and all of them end in the same problem. It looks like there is something being missed or some other issue which is being overlooked.
I even tried to use 3006-rc version on the master and that displays the same problem.
Setup The error is this
The minion function caused an exception: Traceback (most recent call last):
File "salt/minion.py", line 1935, in _thread_return
return_data = minion_instance._execute_job_function(
File "salt/minion.py", line 1894, in _execute_job_function
return_data = self.executors[fname](opts, data, func, args, kwargs)
File "salt/loader/lazy.py", line 149, in __call__
return self.loader.run(run_func, *args, **kwargs)
… File “<string>”, line 4, in raise_exc_info File “salt/ext/tornado/gen.py”, line 1064, in run yielded = self.gen.throw(*exc_info) File “salt/transport/zeromq.py”, line 624, in send recv = yield future File “salt/ext/tornado/gen.py”, line 1056, in run value = future.result() File “salt/ext/tornado/concurrent.py”, line 249, in result raise_exc_info(self._exc_info) File “<string>”, line 4, in raise_exc_info salt.exceptions.SaltReqTimeoutError: Message timed out
Please be as specific as possible and give set-up details.
- [ X] on-prem machine
- [ X] VM (VMware Vsphere 6.7 and 7.0)
- [X ] VM running on a cloud service, please be explicit and add details
- [X ] classic packaging
- [ X] onedir packaging
Steps to Reproduce the behavior I can attach the logs when you need. I have the VMs up and running now if you need them. I can even setup a fresh environment to reproduce this and share data if needed.
Expected behavior
We have a salt 3003.x master and minion setup in the same environment and things run without this problem. I cannot understand what has been changed to make this behave this way. I need help to find out what it is.
Screenshots Will attach if needed and provide any other info you may need.
Versions Report
salt --versions-report
# salt-cloud --versions-report Salt Version: Salt: 3006.0Python Version: Python: 3.10.11 (main, Apr 14 2023, 05:57:16) [GCC 11.2.0]
Dependency Versions: Apache Libcloud: Not Installed cffi: 1.14.6 cherrypy: 18.6.1 dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.2 libgit2: 1.6.3 looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: 3.17 pycryptodome: 3.9.8 pygit2: 1.12.0 python-gnupg: 0.4.8 PyYAML: 5.4.1 PyZMQ: 23.2.0 relenv: 0.11.2 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4
System Versions: dist: debian 12 bookworm locale: utf-8 machine: x86_64 release: 6.1.0-7-amd64 system: Linux version: Debian GNU/Linux 12 bookworm
salt-master --versions-report
Salt Version: Salt: 3006.0
Python Version: Python: 3.10.11 (main, Apr 14 2023, 05:57:16) [GCC 11.2.0]
Dependency Versions: cffi: 1.14.6 cherrypy: 18.6.1 dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.2 libgit2: 1.6.3 looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: 3.17 pycryptodome: 3.9.8 pygit2: 1.12.0 python-gnupg: 0.4.8 PyYAML: 5.4.1 PyZMQ: 23.2.0 relenv: 0.11.2 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4
System Versions: dist: debian 12 bookworm locale: utf-8 machine: x86_64 release: 6.1.0-7-amd64 system: Linux version: Debian GNU/Linux 12 bookworm
PASTE HERE
Additional context Add any other context about the problem here.
Referring back to https://github.com/saltstack/salt/issues/62881, I want to know what the issue in gitfs the user https://github.com/RobinWlund fixed. I did ask, but never got a reply. If you have an idea on what it is, please do let me know
Thanks for helping
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 40 (20 by maintainers)
Oh How I miss May Day 😃, Irish with a name like Murphy 😃 Tuesday is fine, enjoy the long weekend