openshift-ansible: could not start DNS, unable to read config file: open /etc/origin/node/resolv.conf: no such file or directory

Description

Uninstall openshift reinstall after installation

ansible-playbook /data/openshift-ansible/playbooks/adhoc/uninstall.yml

ansible-playbook /data/openshift-ansible/playbooks/byo/config.yml

Version

ansible 2.3.2.0

openshift-ansible  2017-10-19 update from master ,  commitid ca6581dbd5bf06152ad8a321e1fb45911a91cce4

ansible log

TASK [openshift_manage_node : Wait for Node Registration] **************************************************************************************************************************************************************************************
Thursday 19 October 2017  21:32:38 +0800 (0:00:00.078)       0:03:00.870 ******
FAILED - RETRYING: Wait for Node Registration (50 retries left).
ok: [master -> master]
FAILED - RETRYING: Wait for Node Registration (49 retries left).
FAILED - RETRYING: Wait for Node Registration (48 retries left).
FAILED - RETRYING: Wait for Node Registration (47 retries left).
FAILED - RETRYING: Wait for Node Registration (46 retries left).
FAILED - RETRYING: Wait for Node Registration (45 retries left).
FAILED - RETRYING: Wait for Node Registration (44 retries left).
FAILED - RETRYING: Wait for Node Registration (43 retries left).
FAILED - RETRYING: Wait for Node Registration (42 retries left).
FAILED - RETRYING: Wait for Node Registration (41 retries left).
FAILED - RETRYING: Wait for Node Registration (40 retries left).

message log

Oct 19 21:24:19 node1 systemd: origin-node.service holdoff time over, scheduling restart.
Oct 19 21:24:19 node1 systemd: Starting OpenShift Node...
Oct 19 21:24:19 node1 dnsmasq[4965]: setting upstream servers from DBus
Oct 19 21:24:19 node1 dnsmasq[4965]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Oct 19 21:24:19 node1 dnsmasq[4965]: using nameserver 127.0.0.1#53 for domain cluster.local
Oct 19 21:24:20 node1 origin-node: I1019 21:24:20.297680   17564 start_node.go:251] Reading node configuration from /etc/origin/node/node-config.yaml
Oct 19 21:24:20 node1 origin-node: I1019 21:24:20.406336   17564 node.go:123] Initializing SDN node of type "redhat/openshift-ovs-subnet" with configured hostname "node1" (IP ""), iptables sync period "30s"
Oct 19 21:24:20 node1 origin-node: I1019 21:24:20.416313   17564 docker.go:364] Connecting to docker on unix:///var/run/docker.sock
Oct 19 21:24:20 node1 origin-node: I1019 21:24:20.416379   17564 docker.go:384] Start docker client with request timeout=2m0s
Oct 19 21:24:20 node1 origin-node: W1019 21:24:20.418569   17564 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Oct 19 21:24:20 node1 origin-node: F1019 21:24:20.438965   17564 start_node.go:140] could not start DNS, unable to read config file: open /etc/origin/node/resolv.conf: no such file or directory
Oct 19 21:24:20 node1 systemd: origin-node.service: main process exited, code=exited, status=255/n/a
Oct 19 21:24:20 node1 dnsmasq[4965]: setting upstream servers from DBus
Oct 19 21:24:20 node1 systemd: Failed to start OpenShift Node.
Oct 19 21:24:20 node1 systemd: Unit origin-node.service entered failed state.
Oct 19 21:24:20 node1 systemd: origin-node.service failed.

Temporary solution

Delete /etc/resolv.conf includes 99-origin-dns content
Manually create /etc/origin/node/resolv.conf

echo 'nameserver 192.168.1.142' > /etc/origin/node/resolv.conf

Normal ansible log

TASK [openshift_manage_node : Wait for Node Registration] **************************************************************************************************************************************************************************************
Thursday 19 October 2017  21:32:38 +0800 (0:00:00.078)       0:03:00.870 ******
FAILED - RETRYING: Wait for Node Registration (50 retries left).
ok: [master -> master]
FAILED - RETRYING: Wait for Node Registration (49 retries left).
FAILED - RETRYING: Wait for Node Registration (48 retries left).
FAILED - RETRYING: Wait for Node Registration (47 retries left).
FAILED - RETRYING: Wait for Node Registration (46 retries left).
FAILED - RETRYING: Wait for Node Registration (45 retries left).
FAILED - RETRYING: Wait for Node Registration (44 retries left).
FAILED - RETRYING: Wait for Node Registration (43 retries left).
FAILED - RETRYING: Wait for Node Registration (42 retries left).
FAILED - RETRYING: Wait for Node Registration (41 retries left).
FAILED - RETRYING: Wait for Node Registration (40 retries left).
ok: [node1 -> master]

The origin-node starts normally

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 15 (13 by maintainers)

Commits related to this issue

Default to /etc/resolv.conf instead of /etc/origin/node/resolv.conf There is no task that currently sets up /etc/origin/node/resolv.conf but the service is configured to load that file when it starts... — committed to dmsimard/openshift-ansible by deleted user 7 years ago
Default to /etc/resolv.conf instead of /etc/origin/node/resolv.conf There is no task that currently sets up /etc/origin/node/resolv.conf but the service is configured to load that file when it starts... — committed to dmsimard/openshift-ansible by deleted user 7 years ago
Default to /etc/resolv.conf instead of /etc/origin/node/resolv.conf There is no task that currently sets up /etc/origin/node/resolv.conf but the service is configured to load that file when it starts... — committed to dmsimard/openshift-ansible by deleted user 7 years ago

Most upvoted comments

I found the problem with @jfchevrette (Thanks JF!).

The issue is that our environment configures eth0 in /etc/sysconfig/network-scripts/ifcfg-eth0 to explicitely not use NetworkManager for that interface:

# Automatically generated, do not edit
DEVICE=eth0
BOOTPROTO=dhcp
HWADDR=fa:16:3e:b1:98:77
ONBOOT=yes
NM_CONTROLLED=no
TYPE=Ethernet

This means that the interface is not controlled by NetworkManager and therefore restarting NetworkManager does not bring that interface up and the dispatcher script does not run for that interface. Just by commenting out NM_CONTROLLED=NO in the ifcfg-eth0 file and restarting NetworkManager created the /etc/origin/node/resolv.conf properly.

I think a proper “fix” in openshift-ansible would be to add a check that verifies if the interface is in the output of “nmcli con”, if it’s not, fail with a friendly message. I’ll send a PR for that.

dmsimard on Oct 27, 2017

@dmsimard my preference is the check.

michaelgugino on Oct 31, 2017