salt: [BUG] gpg renderer failing to decrypt some encrypted pillars
Description
I’m having issues where even though a set of gpg keys is set up properly and salt can decrypt files on disk with the same encrypted contents as the pillars via running gpg.decrypt only some of the pillars from the master are actually decrypted. I have seen no relevant log entries after setting the master to debug log level so I would appreciate some help in troubleshooting the gpg renderer.
This issue started after I upgraded to salt 3002, but reverting back to previous versions I was running just fine have not fixed it so that may just be coincidental. I had posted to the NTC slack here but not much came of that.
Setup
example_pillar1.sls:
#!yaml|gpg
my_secret: |
-----BEGIN PGP MESSAGE-----
<snipped>
-----END PGP MESSAGE-----
example_pillar2.sls:
#!yaml|gpg
my_other_secret: |
-----BEGIN PGP MESSAGE-----
<snipped>
-----END PGP MESSAGE-----
Steps to Reproduce the behavior
Just run the master/minion as usual. Both of the pillar files are rendered, but only one is decrypted. Both are encrypted with the same key, and both of the encrypted contents WILL decrypt if placed on a file on disk and salt is ran with the gpg.decrypt module against that file.
Expected behavior All secrets encrypted with the same key should be decrypted if any of them are, right?
Also, I would like to add some logging around this if we find something is failing silently. If you can just point me in the right direction to be able to diagnose what is actually going on then I can probably send up a PR.
Screenshots na
Versions Report Running in a conda env, I have tried 3002.x, 3001.x, and 2019.2.x. I also upgraded from f31 to f32 with no change.
(salt) ~ # ❯❯❯ salt --versions-report
Salt Version:
Salt: 3002.2
Dependency Versions:
cffi: 1.14.4
cherrypy: Not Installed
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.11.2
libgit2: 1.1.0
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 0.6.2
msgpack-pure: Not Installed
mysql-python: Not Installed
pycparser: 2.20
pycrypto: 2.6.1
pycryptodome: 3.9.9
pygit2: 1.4.0
Python: 3.8.5 (default, Sep 4 2020, 07:30:14)
python-gnupg: 0.4.6
PyYAML: 3.13
PyZMQ: 20.0.0
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.3
System Versions:
dist: fedora 32
locale: utf-8
machine: x86_64
release: 5.9.16-100.fc32.x86_64
system: Linux
version: Fedora 32
Additional context
I have also wiped my /var/cache/salt/* directories just in case something cached was causing the issue, with no change.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 1
- Comments: 48 (32 by maintainers)
Exactly. I had luckily found it in testing in my home lab before i saw it in prod…
In my mind the expected result from a gpg decryption failure should be an immediate halt. I don’t want data trashed in prod, ever.
The failures seem to be in all saltenvs, but the initial highstates work fine, and not all highstates have failures.
it’s the only thing in ext_pillar (we aren’t using external pillars since we ditched using gitfs)
I have been trying to debug this as best I can but cannot tell if what I am seeing is a salt issue or a GPG issue or a systemd issue.
I have enabled verbose logging to gpg-agent and have been seeing this:
We see this when the pillar issues occur. The sockets should be:
It looks like systemd is assigning sockets (?!) and that is somehow causing the agent to error out.
We aren’t using ext_pillar but I can test adding a reference to see if has an effect.
@raddessi All good. Just trying to get to the scenario where the bug is surfacing 😃 So far I’m up to 20 pillar files with a single GPG encrypted pillar item in each file and haven’t seen the issue manifest yet. Do you have multiple encrypted values in the files?
Thanks. That’s good information to go on. When I tested i did not have the pillar files stored in Git.
Hello @raddessi
I could only get it working with reading from files. My workaround at the moment is.