salt: [BUG] Some 3006.6 minions not compatible with 3006.7 master
Description After some new VMs this week got 3006.7 from the repository out-of-the-box, which did not work with Salt master 3006.6, I had to upgrade the salt-master to 3006.7. But now some 3006.6 minions are no longer reachable.
They log every 10 seconds:
2024-02-22 16:50:10,665 [salt.crypt :823 ][ERROR ][37572] The Salt Master has rejected this minion's public key.
To repair this issue, delete the public key for this minion on the Salt Master.
The Salt Minion will attempt to re-authenicate.
(sic)
Restarting the minion did not help.
So neither updating master first nor minion first resulted in a stable upgrade experience.
After upgrading the minion to 3006.7, reloading systemd and restarting the minion, the connection works.
Setup (Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)
Please be as specific as possible and give set-up details.
- on-prem machine
- VM (Virtualbox, KVM, etc. please specify)
- VM running on a cloud service, please be explicit and add details
- container (Kubernetes, Docker, containerd, etc. please specify)
- or a combination, please be explicit
- jails if it is FreeBSD
- classic packaging
- onedir packaging
- used bootstrap to install
Steps to Reproduce the behavior (Include debug logs if possible and relevant)
Expected behavior I would hope for minion-master compatibility within any minor-version mix, and at least one step difference in major version. (Of course with only the features both support).
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)Minion:
Salt Version:
Salt: 3006.6
Python Version:
Python: 3.10.13 (main, Nov 15 2023, 04:34:27) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.3
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.14.2
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: centos 7.9.2009 Core
locale: utf-8
machine: x86_64
release: 3.10.0-1160.108.1.el7.x86_64
system: Linux
version: CentOS Linux 7.9.2009 Core```
Master:
Python Version: Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0]
Dependency Versions: cffi: 1.14.6 cherrypy: unknown dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.3 libgit2: 1.3.0 looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: 1.7.0 python-gnupg: 0.4.8 PyYAML: 6.0.1 PyZMQ: 23.2.0 relenv: 0.15.1 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4
System Versions: dist: centos 7.9.2009 Core locale: utf-8 machine: x86_64 release: 3.10.0-1160.105.1.el7.x86_64 system: Linux version: CentOS Linux 7.9.2009 Core
</details>
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Reactions: 4
- Comments: 19 (10 by maintainers)
Thank you @MartinEmrich for opening this, we were bitten by the same issue today when upgrading two syndics supporting a compute cluster (bare metal). We had upgraded our upstream master already but only took it to v3006.6, so we didn’t see the problem come up then.
Thank you also @darkpixel for the fix, that resolved the issue for us! We were able to clean up all of our minions containing newlines in their minion.pub’s and minion.pem’s and restart the salt-minion service, which cleared the problem for the affected hosts.
We dug a little deeper and have some info which may be useful in hunting down the root cause:
Hopefully something in there is useful!
@MartinEmrich If you edit your minion.pub and minion.pem using
vi -b minion.puband then run:set noeoland then:wqand start the salt-minion service again, it will come back online until this gets fixed.quick and dirty
minions: sudo truncate -s -1 /etc/salt/pki/minion/minion.pem && sudo truncate -s -1 /etc/salt/pki/minion/minion.pub && sudo systemctl restart salt-minion master: sudo salt-key -A --include-denied
if minion fails check pem pub for missing “-” end of last line
Don’t worry about this too much, I’m not that really attached to the solution 😉
I don’t think you’ll break any conventions since the clean_key was already done on incoming minion’s key before (you can walk through commits history to see that).
Adding
salt.crypt.clean_keyon the right side of this condition worked for me.Thank you for your feedback @kiniou i’ve created a draft PR on your solution. In case the code is not shutdown i’ll credit you where it is appropriate (I’ll have to find out). Also, please let me know if I’m breaking convention by using your solution in my PR. I’ll retract it if that is the case.
This is likely the bast path forward for anyone looking for a quick fix that doesn’t require changes to minions.
@darkpixel thanks for the hint, but as I have to SSH into every host now anyways, I will just upgrade the minions to 3006.7. For the few I tried so far, that fixed it.
I wish that were the case across my infrastructure. I’ve gone as far as removing the keys from the Minion and Master and starting the salt-minion service again. The “new” Minion key lands in the Denied list 10 seconds after it hits the Unaccepted list, with zero interaction in those 10 seconds.
Additionally, their SHA256 fingerprints match on both systems. Minion still gets auto-rejected unless it’s running 3006.7 like the Master.
Even more odd, I’ve got a couple of 3005 clients that are still connected. All my 3006 clients and 3004 clients failed immediately. My 3006 clients were upgraded to 3006.7, keys replaced, and they’re working now. My 3004 clients cannot be upgraded for the next four months. I’ve been working on them all day with no luck. Removing the key (Which was not preseeded and these are not Windows or Mac machines) does not help. I have verified sha256 fingerprints (In binary and non-binary modes) on both Minion and Master, they’re identical.
@darkpixel @twangboy is a away for a few days. I will take a look at this soon.