salt: Using mine.get while targeting grain returns data from dead minions

Hello,

I’m currently running 2014.7.2 and after removing some minions that were VMs using Foreman with the foreman-salt plugging (hence deleting their keys), the data I get from mine while targeting using grain is always from the minions that are now gone.

I’ve tried using salt-run cache.clear_all tgt='*', salt '*' saltutil.clear_cache and salt '*' mine.flush and nothing seems to change the outcome.

What I’m trying to run is salt-call mine.get 'kernel:Linux' backend_ip_addr grain in a template so it looks like {% for host, ips in salt['mine.get']('kernel:Linux', 'backend_ip_addr', 'grain').items() %}.

If I do salt-call mine.get '*' backend_ip_addr I get data without the dead minions. Weirdly enough If do something like salt-call mine.get 'os:Ubuntu' backend_ip_addr grain. I only get data from the same host as the minion I’m running this on even if there is many more minions that are running Ubuntu.

I’ve tried to target minions using pillar and I get either nothing or things that doesn’t make sense…

Thank you.

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Reactions: 2
  • Comments: 45 (13 by maintainers)

Most upvoted comments

What I ended up having to do was to go in /var/cache/salt/master and delete all the minions that are dead by removing the whole directory. Than running salt '*' saltutil.clear_cache. This seemed to finally worked and the data I receive is the correct one.

As far as I see, flush_mine_on_destroy must be supported by salt-cloud driver, for now I see it only for nova and divers with libcloud.

I’ve done more to debug issue #35439 since I last wrote, and I believe this issue and that one are the same thing. I wanted to give some information to other ops professionals out there who might read this, and also workarounds.

tl;dr:

  • Grain targeting in all its forms is broken, and this is particularly painful when using the mine. See #35439 for more information. If you want this to change, comment on that ticket.
  • There are three workarounds that I know about:
    • Use glob targeting. It always works.
    • Use saltutil.cmd in an orchestration, if orchestration is an option.
    • Use Consul instead of the mine.

Grain Targeting is Broken

Deeply situated in the logic of salt is a function dealing with how to target minions based on grains. When salt does this, it consults the minion data cache. By default, the minion data cache is found in /var/cache/salt/master/minions/. If there is an entry for a particular minion in the cache, salt uses it to determine if the minion should be targeted. If there is not an entry for the minion, salt assumes the minion should be targeted.

The last sentence is what fundamentally messes things up for mine users. It is actually a fairly safe assumption in the case where you are using grain targeting to run something like state.apply, since when a minion is targeted which shouldn’t be targeted (that is, the grain used in targeting isn’t set on that minion), it simply ignores the request and you get “Minion did not return” when the call returns.

However, the same logic is used for figuring out what entries are or even should be in the mine. On mine.get calls, say in the jinja of a state file, this causes minions which shouldn’t be returned by mine.get to be returned, causing mayhem. I’m hazier on the details here, but I’m sure this is what is happening.

So in reality, there is no problem with the mine code; it’s deeper than that. It’s in the grain targeting code in salt.

If you want this fixed, please comment on issue #35439 . It’s where I documented all of my debugging work, and where I found out why this is happening.

Workaround: Use Glob Targeting

You might not be able to do this, but if you are, it’s easily the best way to get around this issue. For example, instead of using this:

salt['mine.get'](expr_form='grain', tgt='role:webserver')

You may want to use this instead:

salt['mine.get'](tgt='*webserver*')

This seems to work under all sorts of conditions. It also is sad 😦, since grain targeting is awesome.

Workaround: Salt Orchestration

@almoore pointed this one out 😃

One way to get around using the mine entirely is to use saltutil.cmd in conjunction inline pillars in an orchestration.

As an example, the following salt orchestration provides network IP address information of other minions as a pillar, instead of using the mine to accomplish the same thing:

{% set ips = salt['saltutil.cmd'](tgt='role:webserver', expr_form='grain', fun='network.ip_addrs') %}
apply_state:
  salt.state:
    - tgt: 'role:webserver'
    - expr_form: 'grain'
    - pillar:
        previously_mined_data:
          ips:
{%- for name, ip in ips.items() %}
            {{name}}: {{ip['ret'][0]}}
{%- endfor %}
    - sls:
      - applied.state

This workaround has the advantage of getting the “mined data” immediately before the state is called.

It has the disadvantage that it can’t be called using salt-call; this orchestration must be run from the master.

Workaround: Consul

You can use Consul as a makeshift mine. You would create one state to populate consul with “mine data”, and one state to consume the mine data. All the minions that would need to put data into consul get the “populate” state applied, and all the minions that would need to consume it have states run on them which contain the appropriate salt['consul.get']() calls.

The advantage is that you can use this as a drop-in replacement for the mine. Since consul is populated using a state call, it should be safe to use grain targeting using this option. salt-calls should work as well as calling the states from the master.

The disadvantange is that it’s a bit complicated to set up. That said, I have set up a POC, and I know it works.

Or just this:

rm /var/cache/salt/master/minions/*/mine.p && \
   salt '*' mine.update

I’m using this solution in orchestrate state for 3 months already without any issues.

salt '*' mine.flush removes the /var/cache/salt/master/minions/<node>/mine.p file but only for minions that are ‘alive’. The removed minions still have their directory containing mine.p and data.p. Neither salt-run cache.clear_all tgt='*' nor salt '*' saltutil.clear_cache appears to do anything.

My resolution steps, not all steps might be necessary but I wasn’t taking chances:

service salt-master stop
salt '*' mine.flush
rm -rf /var/cache/salt/master/minions/<offending minion dir>
service salt-master start
salt '*' mine.update