salt: File states seem slower in 2016.3, especially on first cache retrieval

I upgraded to 2016.3 in my VMs and file states seem significantly slower on the first pass (10-20 times slower). However, on subsequent state.apply calls, things seem fine again.

For every file that the minion has to fetch from the master, I see about 30-40 occurances of these log lines in the minion log:

2016-05-27 12:52:34,313 [   DEBUG] [17955] [salt.transport.zeromq] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'orch.test.infra.vagrant.local', 'tcp://10.0.1.99:4506', 'aes')
2016-05-27 12:52:34,313 [   DEBUG] [17955] [salt.crypt] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'orch.test.infra.vagrant.local', 'tcp://10.0.1.99:4506')
2016-05-27 12:52:34,319 [   DEBUG] [17955] [salt.crypt] Re-using SAuth for ('/etc/salt/pki/minion', 'orch.test.infra.vagrant.local', 'tcp://10.0.1.99:4506')

Followed by:

2016-05-27 12:52:35,930 [   DEBUG] [17955] [salt.fileclient] In saltenv 'base', looking at rel_path '|salt/minion/configs/master.conf' to resolve 'salt://|salt/minion/configs/master.conf'
2016-05-27 12:52:35,933 [   DEBUG] [17955] [salt.fileclient] In saltenv 'base', ** considering ** path '/var/cache/salt/minion/files/base/|salt/minion/configs/master.conf' to resolve 'salt://|salt/minion/configs/master.conf'
2016-05-27 12:52:35,933 [   DEBUG] [17955] [salt.fileclient] Fetching file from saltenv 'base', ** attempting ** 'salt://|salt/minion/configs/master.conf'
2016-05-27 12:52:35,933 [   DEBUG] [17955] [salt.fileclient] No dest file found 

Then some more of the zeromq-related log lines, then this:

2016-05-27 12:52:37,632 [    INFO] [17955] [salt.fileclient] Fetching file from saltenv 'base', ** done ** '|salt/minion/configs/master.conf'
2016-05-27 12:52:37,633 [   DEBUG] [17955] [salt.utils.jinja] Jinja search path: '['/var/cache/salt/minion/files/base']'

I had a look at logs from before the upgrade and there were nowhere near the same amounts of log lines regarding re-auth.

vagrant@orch:~$ salt-call --versions
Salt Version:
           Salt: 2016.3.0

Dependency Versions:
           cffi: 0.8.6
       cherrypy: 3.2.3
       dateutil: 2.2
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.7.3
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: 1.2.3
      pycparser: 2.10
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.9 (default, Mar  1 2015, 12:57:24)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.4 
        machine: x86_64
        release: 3.16.0-4-amd64
         system: Linux
        version: debian 8.4

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 2
  • Comments: 19 (17 by maintainers)

Most upvoted comments

oohh, i see! as this ticket was marked as fixed, closed and then the 2016.3.1 was released, we all assume this version had the fix.

I tested with 2016.3.1 and applied the fix patch…and it worked! Still seems slower than 2015.8, but now is a lot faster than it was without the fix.

Thanks, i will wait for the 2016.3.2 release

Just upgraded again to 2016.3.1 and still takes a long time to process…

the salt-minion keeps using 100% of the CPU until it finishes the highstate.

salt-minion -l garbage i get this log showing always a AsyncZeroMQReqChannel and each 1~2 seconds i finally get a normal state being applied:

[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156465323904 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 12483
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156183639104 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156185678768 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[DEBUG   ] In saltenv 'base', looking at rel_path 'templates/CONFIGURATION/nsswitch.conf' to resolve 'salt://templates/CONFIGURATION/nsswitch.conf'
[DEBUG   ] In saltenv 'base', ** considering ** path '/var/cache/salt/minion/files/base/templates/CONFIGURATION/nsswitch.conf' to resolve 'salt://templates/CONFIGURATION/nsswitch.conf'
[INFO    ] Fetching file from saltenv 'base', ** skipped ** latest already in cache 'salt://templates/CONFIGURATION/nsswitch.conf'
[DEBUG   ] Jinja search path: ['/var/cache/salt/minion/files/base']
[INFO    ] File /etc/nsswitch.conf is in the correct state
[INFO    ] Completed state [/etc/nsswitch.conf] at time 20:25:29.541349 duration_in_ms=5715.604
[INFO    ] Running state [/etc/rsyslog.conf] at time 20:25:29.541611
[INFO    ] Executing state file.managed for /etc/rsyslog.conf
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156182034624 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[TRACE   ] ret_val = None
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156182063512 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156185516080 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156183189480 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156157189872 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156181971744 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156184754224 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')
[TRACE   ] Inserted key into loop_instance_map id 140156183288000 for key ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes') and process 10959
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'jumiaA13', 'tcp://192.168.106.1:4506')

Doing a strace -enetwork -f shows in the loop this:

[pid  7894] socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 48
[pid  7894] connect(48, {sa_family=AF_INET, sin_port=htons(4506), sin_addr=inet_addr("192.168.106.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid  7894] getsockopt(48, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid  7894] setsockopt(48, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid  7894] setsockopt(48, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid  7894] setsockopt(48, SOL_TCP, TCP_KEEPIDLE, [300], 4) = 0
[pid  7894] getpeername(48, {sa_family=AF_INET, sin_port=htons(4506), sin_addr=inet_addr("192.168.106.1")}, [16]) = 0
[pid  7894] recvfrom(48, "\377\0\0\0\0\0\0\0\1\177", 12, 0, NULL, NULL) = 10
[pid  7894] recvfrom(48, 0x7f5340000ef2, 2, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid  7894] sendto(48, "\377\0\0\0\0\0\0\0\1\177\3", 11, 0, NULL, 0) = 11
[pid  7894] recvfrom(48, "\3\0", 2, 0, NULL, NULL) = 2
[pid  7894] recvfrom(48, "NULL\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 52, 0, NULL, NULL) = 52
[pid  7894] recvfrom(48, 0x7f5340003200, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
[pid  7894] sendto(48, "\0NULL\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 53, 0, NULL, 0) = 53
[pid  7894] sendto(48, "\4&\5READY\vSocket-Type\0\0\0\3REQ\10Identity\0\0\0\0", 40, 0, NULL, 0) = 40
[pid  7894] recvfrom(48, "\4)\5READY\vSocket-Type\0\0\0\6ROUTER\10Identity\0\0\0\0", 8192, 0, NULL, NULL) = 43
[pid  7893] +++ exited with 0 +++
[pid  7894] +++ exited with 0 +++

So this query is being done several times per second and with the encryption, it eats all the CPU of the machine

Fixed by #33896

Looks like I have the same problem. salt.fileclient is super slow. Some examples for the log:

2016-06-09 09:14:00,996 [salt.fileclient  ][INFO    ][17677] Fetching file from saltenv 'base', ** done ** '|vim/bundles/python-mode/pymode/libs2/rope/base/taskhandle.py'
2016-06-09 09:14:04,187 [salt.fileclient  ][INFO    ][17677] Fetching file from saltenv 'base', ** done ** '|vim/bundles/python-mode/pymode/libs2/rope/base/utils.py'

It took ~4 seconds to copy a file with 2891 bytes!!