openshift-ansible: Openshift Origin Installation fails if cloud provider (AWS) used in inventory file

Description

On openshift cluster if I use AWS as cloud provider. My installation fails while trying to start node service on each node. If I don’t use any cloud provider it appears to be successful

Version

I am using RPM Installation

- ansible 2.3.2.0
- oc v3.6.0+c4dd4cf
- kubernetes v1.6.1+5115d708d7
- features: Basic-Auth GSSAPI Kerberos SPNEGO

Steps To Reproduce

Use below mentioned inventory file and copy to /etc/ansible/hosts file
Run ansible-playbook -i /etc/ansible/hosts /root/openshift-ansible/playbooks/byo/config.yml -vvv

Expected Results

Expected result should be Node service start successfully and I see the output of oc get nodes as successful not in the state of NotReady.

fatal: [osnode04.bdteam.local]: FAILED! => {
    "attempts": 3,
    "changed": false,
    "failed": true,
    "invocation": {
        "module_args": {
            "daemon_reload": false,
            "enabled": null,
            "masked": null,
            "name": "origin-node",
            "no_block": false,
            "state": "restarted",
            "user": false
        }
    },
    "msg": "Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See \"systemctl status origin-node.service\" and \"journalctl -xe\" for details.\n"
}

RUNNING HANDLER [openshift_node : reload systemd units] ************************************************************************************************************************************************************************************************************************
META: ran handlers
	to retry, use: --limit @/root/openshift-ansible/playbooks/byo/config.retry

PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************
localhost                  : ok=21   changed=0    unreachable=0    failed=0
openshift-etcd.bdteam.local : ok=97   changed=34   unreachable=0    failed=0
osmaster01.bdteam.local    : ok=365  changed=110  unreachable=0    failed=1
osmaster02.bdteam.local    : ok=314  changed=95   unreachable=0    failed=1
osnode01.bdteam.local      : ok=146  changed=38   unreachable=0    failed=1
osnode02.bdteam.local      : ok=146  changed=38   unreachable=0    failed=1
osnode03.bdteam.local      : ok=146  changed=38   unreachable=0    failed=1
osnode04.bdteam.local      : ok=146  changed=38   unreachable=0    failed=1


INSTALLER STATUS ***************************************************************************************************************************************************************************************************************************************************************
Initialization             : Complete
etcd Install               : Complete
NFS Install                : Not Started
Load balancer Install      : Not Started
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : In Progress
	This phase can be restarted by running: playbooks/byo/openshift-node/config.yml
GlusterFS Install          : Not Started
Hosted Install             : Not Started
Metrics Install            : Not Started
Logging Install            : Not Started
Service Catalog Install    : Not Started



Failure summary:


  1. Hosts:    osmaster01.bdteam.local, osmaster02.bdteam.local, osnode01.bdteam.local, osnode02.bdteam.local, osnode03.bdteam.local, osnode04.bdteam.local
     Play:     Configure nodes
     Task:     restart node
     Message:  Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.

[root@osmaster01 ~]# packet_write_wait: Connection to 10.X.X.X port 22: Broken pipe

Observed Results

Node service is unable to restart on each nodes or masters.

[root@osmaster01 centos]# oc get nodes
NAME                                        STATUS     AGE       VERSION
ip-10-30-1-200.us-west-1.compute.internal   NotReady   2h        v1.6.1+5115d708d7
ip-10-30-1-27.us-west-1.compute.internal    NotReady   2h        v1.6.1+5115d708d7
ip-10-30-1-43.us-west-1.compute.internal    NotReady   2h        v1.6.1+5115d708d7
ip-10-30-2-109.us-west-1.compute.internal   NotReady   2h        v1.6.1+5115d708d7
ip-10-30-2-182.us-west-1.compute.internal   NotReady   2h        v1.6.1+5115d708d7
ip-10-30-2-251.us-west-1.compute.internal   NotReady   2h        v1.6.1+5115d708d7

Kubectl describe node output

[root@osmaster01 centos]# kubectl describe node ip-10-30-2-251.us-west-1.compute.internal
Name:			ip-10-30-2-251.us-west-1.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=m4.xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=us-west-1
			failure-domain.beta.kubernetes.io/zone=us-west-1a
			kubernetes.io/hostname=osnode04.bdteam.local
			region=primary
			zone=west
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Fri, 06 Oct 2017 18:10:56 +0000
Phase:
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  OutOfDisk 		False 	Fri, 06 Oct 2017 20:37:50 +0000 	Fri, 06 Oct 2017 18:10:56 +0000 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  MemoryPressure 	False 	Fri, 06 Oct 2017 20:37:50 +0000 	Fri, 06 Oct 2017 18:10:56 +0000 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  DiskPressure 		False 	Fri, 06 Oct 2017 20:37:50 +0000 	Fri, 06 Oct 2017 18:10:56 +0000 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  Ready 		False 	Fri, 06 Oct 2017 20:37:50 +0000 	Fri, 06 Oct 2017 18:10:56 +0000 	KubeletNotReady 		runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:		10.30.2.251,10.30.2.251,ip-10-30-2-251.bdteam.local,osnode04.bdteam.local
Capacity:
 cpu:		4
 memory:	16266720Ki
 pods:		40
Allocatable:
 cpu:		4
 memory:	16164320Ki
 pods:		40
System Info:
 Machine ID:			8bd05758fdfc1903174c9fcaf82b71ca
 System UUID:			EC2798A7-3C88-0538-2A95-D28F2BCCDF96
 Boot ID:			5d7f71a8-95f8-4ed6-a7ba-07977e2dc926
 Kernel Version:		3.10.0-693.2.2.el7.x86_64
 OS Image:			CentOS Linux 7 (Core)
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.6.1+5115d708d7
 Kube-Proxy Version:		v1.6.1+5115d708d7
ExternalID:			i-08ae279780695c5f7
Non-terminated Pods:		(0 in total)
  Namespace			Name		CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----		------------	----------	---------------	-------------
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  0 (0%)	0 (0%)		0 (0%)		0 (0%)
Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----							-------------	--------	------			-------
  1h		1h		1	kubelet, ip-10-30-2-251.us-west-1.compute.internal			Warning		ImageGCFailed		unable to find data for container /
  1h		1h		1	kubelet, ip-10-30-2-251.us-west-1.compute.internal			Normal		NodeHasSufficientDisk	Node ip-10-30-2-251.us-west-1.compute.internal status is now: NodeHasSufficientDisk
  1h		1h		1	kubelet, ip-10-30-2-251.us-west-1.compute.internal			Normal		NodeHasSufficientMemory	Node ip-10-30-2-251.us-west-1.compute.internal status is now: NodeHasSufficientMemory
  1h		1h		1	kubelet, ip-10-30-2-251.us-west-1.compute.internal			Normal		NodeHasNoDiskPressure	Node ip-10-30-2-251.us-west-1.compute.internal status is now: NodeHasNoDiskPressure
  1h		1h		1	kubelet, ip-10-30-2-251.us-west-1.compute.internal			Normal		Starting		Starting kubelet.

If I dont use any cloud provider in my ansible config.yml file my installation works fine but I need to resolve this for AWS or any cloud provider

Systemctl output of node service on a particular node

[root@osnode01 centos]# systemctl status origin-node.service
● origin-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/origin-node.service.d
           └─openshift-sdn-ovs.conf
   Active: activating (start) since Fri 2017-10-06 20:39:54 UTC; 8s ago
     Docs: https://github.com/openshift/origin
  Process: 56362 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
  Process: 56360 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
  Process: 56368 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
  Process: 56365 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
 Main PID: 56370 (openshift)
   Memory: 42.2M
   CGroup: /system.slice/origin-node.service
           ├─56370 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
           └─56415 journalctl -k -f

Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546363   56370 pod_container_deletor.go:77] Container "2f4c53551f7b6e654cc1de1159d44856f81b6d16f4ed5d1eb580c9cb3a9bc575" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546425   56370 pod_container_deletor.go:77] Container "851f6503d78acd135e3a4b87009d4163a808856f14757f6123c1cf625123504d" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546448   56370 pod_container_deletor.go:77] Container "88a45a9147f05a0bd9e05ed712069f10b4cea6c2af3ccd0eb1601166f3ccf679" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546460   56370 pod_container_deletor.go:77] Container "a3ef9c2922877e2f25bd4814fd1f4e371fd98a19ad36b54371fd0b1bc51e255b" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546472   56370 pod_container_deletor.go:77] Container "c5102f50c2e01a2100e1dcb025096967e31134c43ffdb1655827b908e5b29f77" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546483   56370 pod_container_deletor.go:77] Container "d68f9392b34c6410e6154c95febcfb55dac109725750ae5c20671c39279c9730" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546494   56370 pod_container_deletor.go:77] Container "eb04adc0b544c64e20ac3c847e03de048f7c7a26ce4d4a6b46282817d0df8e10" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.710842   56370 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: E1006 20:39:59.710981   56370 kubelet.go:2072] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct 06 20:40:00 osnode01.bdteam.local origin-node[56370]: W1006 20:40:00.816290   56370 sdn_controller.go:38] Could not find an allocated subnet for node: osnode01.bdteam.local, Waiting...
[root@osnode01 centos]#

Logs output from one of the node (/var/log/messages)

Oct  6 20:41:15 osnode01 NetworkManager[18586]: <info>  [1507322475.5434] dhcp4 (eth0):   address 10.30.1.43
Oct  6 20:41:15 osnode01 NetworkManager[18586]: <info>  [1507322475.5434] dhcp4 (eth0):   plen 24 (255.255.255.0)
Oct  6 20:41:15 osnode01 NetworkManager[18586]: <info>  [1507322475.5434] dhcp4 (eth0):   gateway 10.30.1.1
Oct  6 20:41:15 osnode01 NetworkManager[18586]: <info>  [1507322475.5434] dhcp4 (eth0):   lease time 3600
Oct  6 20:41:15 osnode01 NetworkManager[18586]: <info>  [1507322475.5434] dhcp4 (eth0):   hostname 'ip-10-30-1-43'
Oct  6 20:41:15 osnode01 NetworkManager[18586]: <info>  [1507322475.5435] dhcp4 (eth0):   nameserver '10.21.0.251'
Oct  6 20:41:15 osnode01 NetworkManager[18586]: <info>  [1507322475.5435] dhcp4 (eth0):   domain name 'bdteam.local'
Oct  6 20:41:15 osnode01 NetworkManager[18586]: <info>  [1507322475.5435] dhcp4 (eth0): state changed bound -> bound
Oct  6 20:41:15 osnode01 dbus-daemon: dbus[632]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Oct  6 20:41:15 osnode01 dbus[632]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Oct  6 20:41:15 osnode01 systemd: Starting Network Manager Script Dispatcher Service...
Oct  6 20:41:15 osnode01 dhclient[18622]: bound to 10.30.1.43 -- renewal in 1686 seconds.
Oct  6 20:41:15 osnode01 dbus[632]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Oct  6 20:41:15 osnode01 dbus-daemon: dbus[632]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Oct  6 20:41:15 osnode01 systemd: Started Network Manager Script Dispatcher Service.
Oct  6 20:41:15 osnode01 nm-dispatcher: req:1 'dhcp4-change' [eth0]: new request (6 scripts)
Oct  6 20:41:15 osnode01 nm-dispatcher: req:1 'dhcp4-change' [eth0]: start running ordered scripts...
Oct  6 20:41:15 osnode01 nm-dispatcher: + cd /etc/sysconfig/network-scripts
Oct  6 20:41:15 osnode01 nm-dispatcher: + . ./network-functions
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ PATH=/sbin:/usr/sbin:/bin:/usr/bin
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ export PATH
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ hostname
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ HOSTNAME=osnode01.bdteam.local
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ '[' -z '' ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ . /etc/init.d/functions
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ TEXTDOMAIN=initscripts
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ umask 022
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ PATH=/sbin:/usr/sbin:/bin:/usr/bin
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ export PATH
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' 56720 -ne 1 -a -z '' ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' -d /run/systemd/system ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ case "$0" in
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ COLUMNS=80
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' -c /dev/stderr -a -r /dev/stderr ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ CONSOLETYPE=serial
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' -f /etc/sysconfig/i18n -o -f /etc/locale.conf ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ . /etc/profile.d/lang.sh
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ unset LANGSH_SOURCED
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' -f /etc/sysconfig/init ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ . /etc/sysconfig/init
Oct  6 20:41:15 osnode01 nm-dispatcher: ++++ BOOTUP=color
Oct  6 20:41:15 osnode01 nm-dispatcher: ++++ RES_COL=60
Oct  6 20:41:15 osnode01 nm-dispatcher: ++++ MOVE_TO_COL='echo -en \033[60G'
Oct  6 20:41:15 osnode01 nm-dispatcher: ++++ SETCOLOR_SUCCESS='echo -en \033[0;32m'
Oct  6 20:41:15 osnode01 nm-dispatcher: ++++ SETCOLOR_FAILURE='echo -en \033[0;31m'
Oct  6 20:41:15 osnode01 nm-dispatcher: ++++ SETCOLOR_WARNING='echo -en \033[0;33m'
Oct  6 20:41:15 osnode01 nm-dispatcher: ++++ SETCOLOR_NORMAL='echo -en \033[0;39m'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' serial = serial ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ BOOTUP=serial
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ MOVE_TO_COL=
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ SETCOLOR_SUCCESS=
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ SETCOLOR_FAILURE=
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ SETCOLOR_WARNING=
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ SETCOLOR_NORMAL=
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ __sed_discard_ignored_files='/\(~\|\.bak\|\.orig\|\.rpmnew\|\.rpmorig\|\.rpmsave\)$/d'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' '' = 1 ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: ++++ cat /proc/cmdline
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ strstr 'BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.2.2.el7.x86_64 root=UUID=29342a0b-e20f-4676-9ecf-dfdf02ef6683 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8' rc.debug
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ '[' 'BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.2.2.el7.x86_64 root=UUID=29342a0b-e20f-4676-9ecf-dfdf02ef6683 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8' = 'BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.2.2.el7.x86_64 root=UUID=29342a0b-e20f-4676-9ecf-dfdf02ef6683 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8' ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ return 1
Oct  6 20:41:15 osnode01 nm-dispatcher: +++ return 0
Oct  6 20:41:15 osnode01 nm-dispatcher: + '[' -f ../network ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: + . ../network
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ NETWORKING=yes
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ NOZEROCONF=yes
Oct  6 20:41:15 osnode01 nm-dispatcher: + [[ dhcp4-change =~ ^(up|dhcp4-change|dhcp6-change)$ ]]
Oct  6 20:41:15 osnode01 nm-dispatcher: + NEEDS_RESTART=0
Oct  6 20:41:15 osnode01 nm-dispatcher: + UPSTREAM_DNS=/etc/dnsmasq.d/origin-upstream-dns.conf
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct  6 20:41:15 osnode01 nm-dispatcher: + UPSTREAM_DNS_TMP=/tmp/tmp.5DzdaQo1tn
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct  6 20:41:15 osnode01 nm-dispatcher: + UPSTREAM_DNS_TMP_SORTED=/tmp/tmp.Ie4FFsjAgL
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct  6 20:41:15 osnode01 nm-dispatcher: + CURRENT_UPSTREAM_DNS_SORTED=/tmp/tmp.0ZlG7MgcgO
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct  6 20:41:15 osnode01 nm-dispatcher: + NEW_RESOLV_CONF=/tmp/tmp.293w7YIsqD
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct  6 20:41:15 osnode01 nm-dispatcher: + NEW_NODE_RESOLV_CONF=/tmp/tmp.D9exxlKVYt
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ /sbin/ip route list match 0.0.0.0/0
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ awk '{print $3 }'
Oct  6 20:41:15 osnode01 nm-dispatcher: + def_route=10.30.1.1
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ /sbin/ip route get to 10.30.1.1
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ awk '{print $3}'
Oct  6 20:41:15 osnode01 nm-dispatcher: + def_route_int=eth0
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ /sbin/ip route get to 10.30.1.1
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ awk '{print $5}'
Oct  6 20:41:15 osnode01 nm-dispatcher: + def_route_ip=10.30.1.43
Oct  6 20:41:15 osnode01 nm-dispatcher: + [[ eth0 == eth0 ]]
Oct  6 20:41:15 osnode01 nm-dispatcher: + '[' '!' -f /etc/dnsmasq.d/origin-dns.conf ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: + grep -q 99-origin-dns.sh /etc/resolv.conf
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ systemctl -q is-active dnsmasq.service
Oct  6 20:41:15 osnode01 nm-dispatcher: + '[' 0 -eq 1 ']'
Oct  6 20:41:15 osnode01 nm-dispatcher: ++ systemctl -q is-active dnsmasq.service
Oct  6 20:41:15 osnode01 nm-dispatcher: + grep -q 99-origin-dns.sh /etc/resolv.conf
Oct  6 20:41:15 osnode01 nm-dispatcher: + sed -e '/^nameserver.*$/d' /etc/resolv.conf
Oct  6 20:41:15 osnode01 nm-dispatcher: + echo 'nameserver 10.30.1.43'
Oct  6 20:41:15 osnode01 nm-dispatcher: + grep -q 'search.*cluster.local' /tmp/tmp.293w7YIsqD
Oct  6 20:41:15 osnode01 nm-dispatcher: + grep -qw search /tmp/tmp.293w7YIsqD
Oct  6 20:41:15 osnode01 nm-dispatcher: + cp -Z /tmp/tmp.293w7YIsqD /etc/resolv.conf
Oct  6 20:41:15 osnode01 nm-dispatcher: + rm -f /tmp/tmp.5DzdaQo1tn /tmp/tmp.Ie4FFsjAgL /tmp/tmp.0ZlG7MgcgO /tmp/tmp.293w7YIsqD
Oct  6 20:41:18 osnode01 origin-node: I1006 20:41:18.210035   56657 aws.go:936] Could not determine public DNS from AWS metadata.
Oct  6 20:41:18 osnode01 origin-node: W1006 20:41:18.246426   56657 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Oct  6 20:41:18 osnode01 origin-node: E1006 20:41:18.246581   56657 kubelet.go:2072] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct  6 20:41:20 osnode01 origin-node: W1006 20:41:20.737092   56657 sdn_controller.go:38] Could not find an allocated subnet for node: osnode01.bdteam.local, Waiting...
Oct  6 20:41:20 osnode01 origin-node: F1006 20:41:20.737146   56657 node.go:309] error: SDN node startup failed: failed to get subnet for this host: osnode01.bdteam.local, error: timed out waiting for the condition
Oct  6 20:41:20 osnode01 systemd: origin-node.service: main process exited, code=exited, status=255/n/a
Oct  6 20:41:20 osnode01 dnsmasq[18837]: setting upstream servers from DBus
Oct  6 20:41:20 osnode01 dnsmasq[18837]: using nameserver 10.21.0.251#53
Oct  6 20:41:20 osnode01 dbus-daemon: dbus[632]: [system] Rejected send message, 0 matched rules; type="method_return", sender=":1.7943" (uid=0 pid=18837 comm="/usr/sbin/dnsmasq -k ") interface="(unset)" member="(unset)" error name="(unset)" requested_reply="0" destination=":1.9458" (uid=0 pid=56795 comm="/usr/bin/dbus-send --system --dest=uk.org.thekelle")
Oct  6 20:41:20 osnode01 dbus[632]: [system] Rejected send message, 0 matched rules; type="method_return", sender=":1.7943" (uid=0 pid=18837 comm="/usr/sbin/dnsmasq -k ") interface="(unset)" member="(unset)" error name="(unset)" requested_reply="0" destination=":1.9458" (uid=0 pid=56795 comm="/usr/bin/dbus-send --system --dest=uk.org.thekelle")
Oct  6 20:41:20 osnode01 systemd: Failed to start OpenShift Node.
Oct  6 20:41:20 osnode01 systemd: Unit origin-node.service entered failed state.
Oct  6 20:41:20 osnode01 systemd: origin-node.service failed.

Additional Information

CentOS Linux release 7.4.1708 (Core)
Inventory File is as per below

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=origin
openshift_master_cluster_method=native
openshift_master_cluster_hostname=osmasterelb.bdteam.local
openshift_master_cluster_public_hostname=osmasterelb.bdteam.local
openshift_clock_enabled=true
openshift_master_default_subdomain= apps.bdteam.local
openshift_cloudprovider_kind=aws
openshift_cloudprovider_aws_access_key=XXXXXXX
openshift_cloudprovider_aws_secret_key=XXXXXXXXX

# host group for masters
[masters]
osmaster01.bdteam.local openshift_hostname=osmaster01.bdteam.local
osmaster02.bdteam.local openshift_hostname=osmaster02.bdteam.local

[etcd]
openshift-etcd.bdteam.local openshift_hostname=openshift-etcd.bdteam.local

[nodes]
osmaster01.bdteam.local openshift_hostname=osmaster01.bdteam.local
osmaster02.bdteam.local openshift_hostname=osmaster02.bdteam.local
osnode01.bdteam.local openshift_node_labels="{'region': 'infra', 'zone': 'west'}"  openshift_hostname=osnode01.bdteam.local
osnode03.bdteam.local openshift_node_labels="{'region': 'infra', 'zone': 'west'}" openshift_hostname=osnode03.bdteam.local
osnode02.bdteam.local openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=osnode02.bdteam.local
osnode04.bdteam.local openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=osnode04.bdteam.local

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 1
Comments: 22 (8 by maintainers)

Most upvoted comments

Hello, I can confirm that the problem exists using OpenShift Origin v3.6 and openshift-ansible with git tag openshift-ansible-3.6.173.0.9-1 using Amazon Web Services (AWS). The problem exists when you have your custom host names or custom domain configured e.g. mymaster1.example.internal and so on.

The aws cloud provider works fine only when you use the hostname/domain in your ansible inventory *.hosts file, the same as displayed in the AWS instance Private DNS field (in ec2 instance description) e.g.:

Private DNS: ip-10-212-31-117.eu-west-1.compute.internal

To do so you must have VPC DHCP options configured with empty domain-name eg.:

{ "DhcpOptions": [ { "DhcpConfigurations": [ { "Values": [ { "Value": "AmazonProvidedDNS" } ], "Key": "domain-name-servers" } ], "DhcpOptionsId": "dopt-<lkjlkfdj>" } ] }

The hostname in CentOS Linux must be the same as above: ip-10-212-31-117.eu-west-1.compute.internal.

The following commands also must return ip-10-212-31-117.eu-west-1.compute.internal:

The similar problem is also mentioned in the issue: https://github.com/kubernetes/kubernetes/issues/11543

I’m looking forward for a fix or workaround to use custom domain and hostnames when using aws cloud provider.

Regards, Pawel

bemnum on Oct 19, 2017

Same error here since end of August with RHEL7.4 + Openshift Enterprise on AWS. (See also #5691)

Nodes are registered in the cluster with AWS DNS domain suffix (<az>.compute.internal) instead of the public_dns_domain we provide. Forcing it with hostname in inventory file using ‘openshift_hostname’ doesn’t help.

Support request opened with Red Hat (case 01937377), still waiting for resolution.

$ oc get nodes
NAME                                         STATUS     AGE       VERSION
ip-10-0-132-148.eu-west-1.compute.internal   NotReady   15d       v1.6.1+5115d708d7
ip-10-0-132-201.eu-west-1.compute.internal   NotReady   15d       v1.6.1+5115d708d7
ip-10-0-132-38.eu-west-1.compute.internal    NotReady   15d       v1.6.1+5115d708d7
ip-10-0-133-100.eu-west-1.compute.internal   NotReady   15d       v1.6.1+5115d708d7
ip-10-0-133-173.eu-west-1.compute.internal   NotReady   15d       v1.6.1+5115d708d7
ip-10-0-134-180.eu-west-1.compute.internal   NotReady   15d       v1.6.1+5115d708d7
ip-10-0-134-31.eu-west-1.compute.internal    NotReady   15d       v1.6.1+5115d708d7

Same for networks.

$ oc get hostsubnets
NAME                                         HOST                                         HOST IP        SUBNET
ip-10-0-132-148.eu-west-1.compute.internal   ip-10-0-132-148.eu-west-1.compute.internal   10.0.132.148   172.16.14.0/23
ip-10-0-132-201.eu-west-1.compute.internal   ip-10-0-132-201.eu-west-1.compute.internal   10.0.132.201   172.16.10.0/23
ip-10-0-132-38.eu-west-1.compute.internal    ip-10-0-132-38.eu-west-1.compute.internal    10.0.132.38    172.16.0.0/23
ip-10-0-133-100.eu-west-1.compute.internal   ip-10-0-133-100.eu-west-1.compute.internal   10.0.133.100   172.16.12.0/23
ip-10-0-133-173.eu-west-1.compute.internal   ip-10-0-133-173.eu-west-1.compute.internal   10.0.133.173   172.16.16.0/23
ip-10-0-134-180.eu-west-1.compute.internal   ip-10-0-134-180.eu-west-1.compute.internal   10.0.134.180   172.16.6.0/23
ip-10-0-134-31.eu-west-1.compute.internal    ip-10-0-134-31.eu-west-1.compute.internal    10.0.134.31    172.16.8.0/23

patlachance on Oct 7, 2017