4

I'm testing Salt. I have a simple test setup of 3 VirtualBox VM's -- with salt-master running on one of the machines and salt-minions running on the other 2 VM's.

I can start either of the salt minion VM's and they will connect to the master and receive commands. If I start both minion VM's, they will both connect for a short period of time, and then one will drop and show as not connected from the master.

Actually, I don't even need to have more than one VM client active. With 1 VM client and 1 VM salt-minion, it will disconnect.

I can restart the salt-minion and it will reconnect to the master and receive commands again... for a couple of minutes at least. Eventually, it will show as disconnected on the master. Running the salt-minion in debug doesn't appear to show anything that explains why it's showing as disconnected on the master.

What could be causing this?

Edit:

The OS I'm using is Ubuntu 14.04. The master and minion environments are the same except for the salt-master package. Running --versions-report on master and minion gives the following versions:

             Salt: 2015.5.3
            Python: 2.7.6 (default, Mar 22 2014, 22:59:56)
            Jinja2: 2.7.2
          M2Crypto: 0.21.1
    msgpack-python: 0.3.0
      msgpack-pure: Not Installed
          pycrypto: 2.6.1
           libnacl: Not Installed
            PyYAML: 3.10
             ioflo: Not Installed
             PyZMQ: 14.0.1
              RAET: Not Installed
               ZMQ: 4.0.4
              Mako: Not Installed
           Tornado: Not Installed
Debian source package: 2015.5.3+ds-1trusty1
Brendan Abel
  • 270
  • 3
  • 13
  • Did you ever figure this out? I'm having a similar problem.. – kodybrown Nov 02 '16 at 21:24
  • @wasatchwizard Not really. I was just doing a trial of all the different deployment tools. – Brendan Abel Nov 02 '16 at 22:02
  • @wasatchwizard did you ever figure this out? I have the same issue: 10 minions staying perfectly and permanently connected, and 1 minion constantly reconnecting within a minute, until it is detected 15 minutes later (it times out after 16~16.5 minutes consistently). – Niels Keurentjes Jul 06 '17 at 07:50
  • @NielsKeurentjes I had the same issue as you're having now, it seems. In my case, something went wonky during the minion installation on those couple devices - I'm guessing. Because, I fixed it by uninstalling the minion and forcing a re-install/update of all of its dependencies (such as Python), then re-installed it. I'm sorry, I don't actually know the cause though.. – kodybrown Jul 07 '17 at 14:30
  • @wasatchwizard thanks for the response. I managed to reproduce it consistently between any 2 Azure hosts, and none hosted elsewhere. Seems to be a curiosity in Azure internal networking in my case. I stabilized it by running `salt '*' test.ping` every minute (although I should make that a `salt -G 'cloud:Azure' test.ping` ;) – Niels Keurentjes Jul 09 '17 at 20:42

1 Answers1

1

Connectivity issues are usually caused by the ZMQ library (less than 4.X.X) and/or salt version . Pleas run salt --versions-report on master and salt-call --versions-report in order to see what versions you are using. You should be running:

Salt: 2015.5.3
...
ZMQ: 4.0.5

You should also try to reproduce the issue with a simple vagrant-salt demo. Notice that you will need to change the salt versions in the vagrant file to "2015.5.3"

You haven't specified what OSes or Salt version you are using but there is ongoing issue with the zmq package used by salt that causes slow connections and drops. It is highly recommended to upgrade the zmq package: (this is redhat based sls file)

{% if grains['os'] in ('RedHat', 'CentOS', 'Fedora') %}
  {% if grains['os'] == 'Fedora' %}
    {% set repotype = 'fedora' %}
  {% else %}
    {% set repotype = 'epel' %}
  {% endif %}
saltstack-zeromq4:
  pkgrepo.managed:
    - humanname: Copr repo for zeromq4 owned by saltstack
    - baseurl: http://copr-be.cloud.fedoraproject.org/results/saltstack/zeromq4/{{ repotype }}-$releasever-$basearch/
    - gpgcheck: 0
    - skip_if_unavailable: True
    - enabled: 1
{% endif %}

{% if grains['os'] in ('RedHat', 'CentOS', 'Fedora') %}
update_zmq:
  pkg:
    - latest
    - pkgs:
      - zeromq
      - python-zmq
    - order: last
  cmd:
    - wait
    - name: echo service salt-minion restart | at now + 1 minute
    - watch:
      - pkg: update_zmq
{% endif %}

Another "hack" is to ping the machines every minute or so, just add this to the salt-master minion config:

"salt '*' test.ping > /dev/null":
  cron.present:
    - user: root
    - minute: '*/1'

You can also ping the master from the minion by setting the master_alive_interval option in the minion config file.

SimSimY
  • 123
  • 1
  • 7