salt: [BUG] gpg renderer failing to decrypt some encrypted pillars

Description I’m having issues where even though a set of gpg keys is set up properly and salt can decrypt files on disk with the same encrypted contents as the pillars via running gpg.decrypt only some of the pillars from the master are actually decrypted. I have seen no relevant log entries after setting the master to debug log level so I would appreciate some help in troubleshooting the gpg renderer.

This issue started after I upgraded to salt 3002, but reverting back to previous versions I was running just fine have not fixed it so that may just be coincidental. I had posted to the NTC slack here but not much came of that.

Setup example_pillar1.sls:

#!yaml|gpg
my_secret: | 
    -----BEGIN PGP MESSAGE-----

    <snipped>
    -----END PGP MESSAGE-----

example_pillar2.sls:

#!yaml|gpg
my_other_secret: | 
    -----BEGIN PGP MESSAGE-----

    <snipped>
    -----END PGP MESSAGE-----

Steps to Reproduce the behavior Just run the master/minion as usual. Both of the pillar files are rendered, but only one is decrypted. Both are encrypted with the same key, and both of the encrypted contents WILL decrypt if placed on a file on disk and salt is ran with the gpg.decrypt module against that file.

Expected behavior All secrets encrypted with the same key should be decrypted if any of them are, right?

Also, I would like to add some logging around this if we find something is failing silently. If you can just point me in the right direction to be able to diagnose what is actually going on then I can probably send up a PR.

Screenshots na

Versions Report Running in a conda env, I have tried 3002.x, 3001.x, and 2019.2.x. I also upgraded from f31 to f32 with no change.

(salt) ~ # ❯❯❯ salt --versions-report
Salt Version:
          Salt: 3002.2
 
Dependency Versions:
          cffi: 1.14.4
      cherrypy: Not Installed
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.11.2
       libgit2: 1.1.0
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: 2.6.1
  pycryptodome: 3.9.9
        pygit2: 1.4.0
        Python: 3.8.5 (default, Sep  4 2020, 07:30:14)
  python-gnupg: 0.4.6
        PyYAML: 3.13
         PyZMQ: 20.0.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.3
 
System Versions:
          dist: fedora 32 
        locale: utf-8
       machine: x86_64
       release: 5.9.16-100.fc32.x86_64
        system: Linux
       version: Fedora 32 
 

Additional context I have also wiped my /var/cache/salt/* directories just in case something cached was causing the issue, with no change.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 1
  • Comments: 48 (32 by maintainers)

Most upvoted comments

Exactly. I had luckily found it in testing in my home lab before i saw it in prod…

In my mind the expected result from a gpg decryption failure should be an immediate halt. I don’t want data trashed in prod, ever.

The failures seem to be in all saltenvs, but the initial highstates work fine, and not all highstates have failures.

it’s the only thing in ext_pillar (we aren’t using external pillars since we ditched using gitfs)

I have been trying to debug this as best I can but cannot tell if what I am seeing is a salt issue or a GPG issue or a systemd issue.

I have enabled verbose logging to gpg-agent and have been seeing this:

2022-03-18 22:39:55 gpg-agent[27529] gpg-agent (GnuPG) 2.2.4 stopped
2022-03-18 22:59:56 gpg-agent[32363] listening on socket '/run/user/0/gnupg/d.cxkjysnknrwpofuut7xcejuo/S.gpg-agent'
2022-03-18 22:59:56 gpg-agent[32363] listening on socket '/run/user/0/gnupg/d.cxkjysnknrwpofuut7xcejuo/S.gpg-agent.extra'
2022-03-18 22:59:56 gpg-agent[32363] listening on socket '/run/user/0/gnupg/d.cxkjysnknrwpofuut7xcejuo/S.gpg-agent.browser'
2022-03-18 22:59:56 gpg-agent[32363] listening on socket '/run/user/0/gnupg/d.cxkjysnknrwpofuut7xcejuo/S.gpg-agent.ssh'
2022-03-18 22:59:56 gpg-agent[32366] socket file has been removed - shutting down

We see this when the pillar issues occur. The sockets should be:

/etc/salt/gpgkeys/S.gpg-agent
/etc/salt/gpgkeys/S.gpg-agent.browser
/etc/salt/gpgkeys/S.gpg-agent.extra
/etc/salt/gpgkeys/S.gpg-agent.ssh

It looks like systemd is assigning sockets (?!) and that is somehow causing the agent to error out.

We aren’t using ext_pillar but I can test adding a reference to see if has an effect.

@raddessi All good. Just trying to get to the scenario where the bug is surfacing 😃 So far I’m up to 20 pillar files with a single GPG encrypted pillar item in each file and haven’t seen the issue manifest yet. Do you have multiple encrypted values in the files?

Thanks. That’s good information to go on. When I tested i did not have the pillar files stored in Git.

Hello @raddessi

I could only get it working with reading from files. My workaround at the moment is.

#!jinja|yaml|gpg
 {% import_text "ssh_secret.gpg" as a_secret %}

ssh-secret3: {{ a_secret|yaml_encode }}