rancher: [network] Host [192.168.x.y] is not able to connect to the following ports: [192.168.x.y:2379]. Please check network policies and firewall rules

Having destroyed a Rancher 2 test cluster and completely cleaned all the baremetal/vm in-house nodes, I’m seeing the following message when provisioning the first node as etcd and control:

[network] Host [192.168.x.y] is not able to connect to the following ports: [192.168.x.y:2379]. Please check network policies and firewall rules.

This error started appearing in Rancher v2.0.3 - prior to that version, I did not have that issue.

The cluster nodes are running RancherOS v1.4.0.

This issue makes it impossible for me to set up a new cluster, so if someone has a clue what is going on, I would highly appreciate some feedback.

Thanks


Useful Info
Versions Rancher v2.0.4 UI: v2.0.53
Access local admin
Route authenticated.cluster.nodes.index

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 15 (4 by maintainers)

Most upvoted comments

Instead of cleaning the host, I decided to reinstall RancherOS on it - that solved the issue. Then, comparing the state of the freshly installed node with one that I cleaned and rebooted, I found that some more folders needed to be removed.

After also removing /opt/rke it appears to be working - /opt/rke holds the certs, so that explains the bad-cert issue.

I found that if not rebooting before removing the folders, some were not removed due to device or resource busy, so the steps should be:

  • Clean docker on the host
  • Reboot the host
  • Clean folders on the host

I’ve attached my two scripts for cleaning the hosts.

I don’t recall seeing it mentioned in the Rancher docs that a node should be clean before including it in the cluster - maybe this quite important information could be included.

rancher_clean-dirs.sh rancher_clean-docker.sh

Thanks for all your help, @superseb

Yes, I did a complete clean of the nodes, according to a forum posting I found (I changed it slightly because I found it didn’t clean out everything):

docker rm -f $(docker ps -qa)
docker rmi $(docker images -a -q)
docker volume rm -f $(docker volume ls -q)
sudo -s
cleanupdirs="/var/lib/etcd /etc/kubernetes /etc/cni /opt/cni /var/lib/cni /var/run/calico"
for dir in $cleanupdirs; do
  echo "Removing $dir"
  rm -rf $dir
done
exit

Also, to rule out any conflicts with previous versions of Rancher, I re-provisioned the system running the rancher-server and installed v2.0.4 on it.

After creating a new cluster and adding one etcd / control node to it, the rancher/rancher container log (attached) shows several “remote error: tls: bad certificate” messages.

rancher-server_docker-ce-17.log