openshift-ansible: upgrade.yml playbook fails to upgrade from 3.10.0 to 3.10.z

Description

Upgrading from v3.10.0 to the latest v3.10.z using the upgrade.yml ansible playbook fails at the openshift_service_catalog : Verify that the catalog api server is running task.

Version

Please put the following version information in the code block indicated below.

  • Your ansible version per ansible --version
ansible 2.6.5
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jul 13 2018, 13:06:57) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

If you’re operating from a git clone:

  • The output of git describe
openshift-ansible-3.10.67-1

I am using the here the release-3.10 git branch.

Place the output between the code block below:

oc v3.10.0+0c4577e-1
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://inst1.mydomain.com:8443
openshift v3.10.0+2084755-68
kubernetes v1.10.0+b81c8f8

Steps To Reproduce
  1. Install 3.10.0
  2. Upgrade to 3.10.z
Expected Results

I expect the upgrade to succeed including all ansible tasks.

For the upgrade I use the following command:

ansible-playbook playbooks/byo/openshift-cluster/upgrades/v3_10/upgrade.yml
Observed Results

After around 25 minutes the upgrade fails with the following detailed output:

TASK [openshift_service_catalog : Verify that the catalog api server is running] ***********************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:36:42 +0100 (0:00:00.329)       0:14:37.320 ******* 
FAILED - RETRYING: Verify that the catalog api server is running (60 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (59 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (58 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (57 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (56 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (55 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (54 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (53 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (52 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (51 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (50 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (49 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (48 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (47 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (46 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (45 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (44 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (43 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (42 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (41 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (40 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (39 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (38 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (37 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (36 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (35 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (34 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (33 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (32 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (31 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (30 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (29 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (28 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (27 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (26 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (25 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (24 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (23 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (22 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (21 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (20 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (19 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (18 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (17 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (16 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (15 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (14 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (13 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (12 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (11 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (10 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (9 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (8 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (7 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (6 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (5 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (4 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (3 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (2 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (1 retries left).
fatal: [inst1.mydomain.com]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["curl", "-k", "https://apiserver.kube-service-catalog.svc/healthz"], "delta": "0:00:00.283071", "end": "2018-10-30 23:47:16.335563", "msg": "non-zero return code", "rc": 6, "start": "2018-10-30 23:47:16.052492", "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: apiserver.kube-service-catalog.svc; Unknown error", "stderr_lines": ["  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: apiserver.kube-service-catalog.svc; Unknown error"], "stdout": "", "stdout_lines": []}
...ignoring

TASK [openshift_service_catalog : Check status in the kube-service-catalog namespace] ******************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:47:16 +0100 (0:10:33.843)       0:25:11.163 ******* 
changed: [inst1.mydomain.com]

TASK [openshift_service_catalog : debug] ***************************************************************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:47:17 +0100 (0:00:01.062)       0:25:12.225 ******* 
ok: [inst1.mydomain.com] => {
    "msg": [
        "In project kube-service-catalog on server https://inst1.mydomain.com:8443", 
        "", 
        "https://apiserver-kube-service-catalog.router.default.svc.cluster.local (passthrough) to pod port secure (svc/apiserver)", 
        "  daemonset/apiserver manages docker.io/openshift/origin-service-catalog:v3.10", 
        "    generation #5 running for 6 hours - 0/1 pods growing to 1", 
        "  pod/apiserver-qxw2d runs docker.io/openshift/origin-service-catalog:v3.10", 
        "", 
        "svc/controller-manager - 172.30.102.9:443 -> 6443", 
        "  daemonset/controller-manager manages docker.io/openshift/origin-service-catalog:v3.10", 
        "    generation #1 running for 6 hours - 1 pod", 
        "  pod/controller-manager-dw7mg runs docker.io/openshift/origin-service-catalog:v3.10", 
        "", 
        "", 
        "2 infos identified, use 'oc status -v' to see details."
    ]
}

TASK [openshift_service_catalog : Get pods in the kube-service-catalog namespace] **********************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:47:17 +0100 (0:00:00.286)       0:25:12.512 ******* 
changed: [inst1.mydomain.com]

TASK [openshift_service_catalog : debug] ***************************************************************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:47:18 +0100 (0:00:01.086)       0:25:13.599 ******* 
ok: [inst1.mydomain.com] => {
    "msg": [
        "NAME                       READY     STATUS              RESTARTS   AGE       IP            NODE", 
        "apiserver-qxw2d            0/1       ContainerCreating   0          10m       <none>        inst1", 
        "controller-manager-dw7mg   1/1       Running             3          6h        10.128.0.33   inst1"
    ]
}

TASK [openshift_service_catalog : Get events in the kube-service-catalog namespace] ********************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:47:19 +0100 (0:00:00.286)       0:25:13.886 ******* 
changed: [inst1.mydomain.com]

TASK [openshift_service_catalog : debug] ***************************************************************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:47:20 +0100 (0:00:01.010)       0:25:14.896 ******* 
ok: [inst1.mydomain.com] => {
    "msg": [
        "LAST SEEN   FIRST SEEN   COUNT     NAME                               KIND      SUBOBJECT                    TYPE      REASON           SOURCE                 MESSAGE", 
        "50m         50m          1         apiserver-cgr8g.156281b37d153c9d   Pod                                    Warning   NetworkFailed    openshift-sdn, inst1   The pod's network interface has been lost and the pod will be stopped.", 
        "50m         50m          1         apiserver-cgr8g.156281b3c1cb7ef3   Pod                                    Normal    SandboxChanged   kubelet, inst1         Pod sandbox changed, it will be killed and re-created.", 
        "50m         50m          1         apiserver-cgr8g.156281b3cc4700cc   Pod       spec.containers{apiserver}   Normal    Killing          kubelet, inst1         Killing container with id docker://apiserver:Need to kill Pod", 
        "50m         50m          1         apiserver-cgr8g.156281b4279ea7e8   Pod       spec.containers{apiserver}   Normal    Pulled           kubelet, inst1         Container image \"docker.io/openshift/origin-service-catalog:v3.10\" already present on machine", 
        "50m         50m          1         apiserver-cgr8g.156281b42b2b4c3c   Pod       spec.containers{apiserver}   Normal    Created          kubelet, inst1         Created container", 
        "50m         50m          1         apiserver-cgr8g.156281b44513ec01   Pod       spec.containers{apiserver}   Normal    Started          kubelet, inst1         Started container", 
        "33m         33m          1         apiserver-cgr8g.1562829de08d5bbe   Pod       spec.containers{apiserver}   Normal    Killing          kubelet, inst1         Killing container with id docker://apiserver:Need to kill Pod", 
        "33m         33m          1         apiserver-ckgrq.156282a166304ee1   Pod       spec.containers{apiserver}   Normal    Pulled           kubelet, inst1         Container image \"docker.io/openshift/origin-service-catalog:v3.10\" already present on machine", 
        "33m         33m          1         apiserver-ckgrq.156282a169210ad3   Pod       spec.containers{apiserver}   Normal    Created          kubelet, inst1         Created container", 
        "33m         33m          1         apiserver-ckgrq.156282a174869589   Pod       spec.containers{apiserver}   Normal    Started          kubelet, inst1         Started container", 
        "10m         10m          1         apiserver-ckgrq.156283d8f57682d9   Pod       spec.containers{apiserver}   Normal    Killing          kubelet, inst1         Killing container with id docker://apiserver:Need to kill Pod", 
        "10m         10m          3         apiserver-ckgrq.156283d904a6ba22   Pod                                    Warning   FailedKillPod    kubelet, inst1         error killing pod: failed to \"KillPodSandbox\" for \"1fa3ff30-dc91-11e8-915b-aa0000000001\" with KillPodSandboxError: \"rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \\\"apiserver-ckgrq_kube-service-catalog\\\" network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: connection refused\"", 
        "", 
        "10m       10m       1         apiserver-qxw2d.156283da2c693435                      Pod                                               Warning   FailedCreatePodSandBox   kubelet, inst1                       Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container \"a8bf450caab80a5f7807dff38c58574a5c501b7da5e49921230f2aa27ed9cc49\" network for pod \"apiserver-qxw2d\": NetworkPlugin cni failed to set up pod \"apiserver-qxw2d_kube-service-catalog\" network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: connection refused, failed to clean up sandbox container \"a8bf450caab80a5f7807dff38c58574a5c501b7da5e49921230f2aa27ed9cc49\" network for pod \"apiserver-qxw2d\": NetworkPlugin cni failed to teardown pod \"apiserver-qxw2d_kube-service-catalog\" network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: connection refused]", 
        "43s       10m       46        apiserver-qxw2d.156283da4e6dacd8                      Pod                                               Normal    SandboxChanged           kubelet, inst1                       Pod sandbox changed, it will be killed and re-created.", 
        "33m       33m       1         apiserver.1562829dd52d21e1                            DaemonSet                                         Normal    SuccessfulDelete         daemonset-controller                 Deleted pod: apiserver-cgr8g", 
        "33m       33m       1         apiserver.156282a0be0459d4                            DaemonSet                                         Normal    SuccessfulCreate         daemonset-controller                 Created pod: apiserver-ckgrq", 
        "10m       10m       1         apiserver.156283d8e957480f                            DaemonSet                                         Normal    SuccessfulDelete         daemonset-controller                 Deleted pod: apiserver-ckgrq", 
        "10m       10m       1         apiserver.156283d9e96976dc                            DaemonSet                                         Normal    SuccessfulCreate         daemonset-controller                 Created pod: apiserver-qxw2d", 
        "1h        1h        1         controller-manager-dw7mg.15627f66b7cd05a8             Pod         spec.containers{controller-manager}   Normal    Pulled                   kubelet, inst1                       Container image \"docker.io/openshift/origin-service-catalog:v3.10\" already present on machine", 
        "1h        1h        1         controller-manager-dw7mg.15627f66ba821e51             Pod         spec.containers{controller-manager}   Normal    Created                  kubelet, inst1                       Created container", 
        "1h        1h        1         controller-manager-dw7mg.15627f66cae24cc0             Pod         spec.containers{controller-manager}   Normal    Started                  kubelet, inst1                       Started container", 
        "50m       50m       1         controller-manager-dw7mg.156281b389cb2a78             Pod                                               Warning   NetworkFailed            openshift-sdn, inst1                 The pod's network interface has been lost and the pod will be stopped.", 
        "50m       50m       1         controller-manager-dw7mg.156281b3c637173b             Pod                                               Normal    SandboxChanged           kubelet, inst1                       Pod sandbox changed, it will be killed and re-created.", 
        "50m       50m       1         controller-manager-dw7mg.156281b3ce6a0db6             Pod         spec.containers{controller-manager}   Normal    Killing                  kubelet, inst1                       Killing container with id docker://controller-manager:Need to kill Pod", 
        "50m       50m       1         controller-manager-dw7mg.156281b45432cb45             Pod         spec.containers{controller-manager}   Normal    Pulled                   kubelet, inst1                       Container image \"docker.io/openshift/origin-service-catalog:v3.10\" already present on machine", 
        "50m       50m       1         controller-manager-dw7mg.156281b456dd1dc3             Pod         spec.containers{controller-manager}   Normal    Created                  kubelet, inst1                       Created container", 
        "50m       50m       1         controller-manager-dw7mg.156281b46b91ea63             Pod         spec.containers{controller-manager}   Normal    Started                  kubelet, inst1                       Started container", 
        "50m       50m       1         service-catalog-controller-manager.156281b488b01722   ConfigMap                                         Normal    LeaderElection           service-catalog-controller-manager   controller-manager-dw7mg-external-service-catalog-controller became leader"
    ]
}

TASK [openshift_service_catalog : Get pod logs] ********************************************************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:47:20 +0100 (0:00:00.296)       0:25:15.192 ******* 
fatal: [inst1.mydomain.com]: FAILED! => {"changed": true, "cmd": ["oc", "logs", "daemonset/apiserver", "--tail=200", "--config=/etc/origin/master/admin.kubeconfig", "-n", "kube-service-catalog"], "delta": "0:00:00.420825", "end": "2018-10-30 23:47:21.452590", "msg": "non-zero return code", "rc": 1, "start": "2018-10-30 23:47:21.031765", "stderr": "Error from server (BadRequest): container \"apiserver\" in pod \"apiserver-qxw2d\" is waiting to start: ContainerCreating", "stderr_lines": ["Error from server (BadRequest): container \"apiserver\" in pod \"apiserver-qxw2d\" is waiting to start: ContainerCreating"], "stdout": "", "stdout_lines": []}
...ignoring

TASK [openshift_service_catalog : debug] ***************************************************************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:47:21 +0100 (0:00:01.091)       0:25:16.284 ******* 
ok: [inst1.mydomain.com] => {
    "msg": []
}

TASK [openshift_service_catalog : Report errors] *******************************************************************************************************************************************************************************************************************************
Tuesday 30 October 2018  23:47:21 +0100 (0:00:00.282)       0:25:16.566 ******* 
fatal: [inst1.mydomain.com]: FAILED! => {"changed": false, "msg": "Catalog install failed."}

PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************
inst1.mydomain.com    : ok=679  changed=170  unreachable=0    failed=1   
inst2.mydomain.com    : ok=19   changed=0    unreachable=0    failed=0   
inst3.mydomain.com    : ok=19   changed=0    unreachable=0    failed=0   
localhost                  : ok=22   changed=0    unreachable=0    failed=0   

Tuesday 30 October 2018  23:47:22 +0100 (0:00:00.439)       0:25:17.006 ******* 
=============================================================================== 
openshift_service_catalog : Verify that the catalog api server is running --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 633.84s
openshift_web_console : Pause for the web console deployment to start -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 30.28s
openshift_control_plane : verify API server ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 18.32s
openshift_service_catalog : oc_process --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 16.07s
openshift_node : Wait for master API to come back online --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.99s
Ensure openshift-ansible installer package deps are installed ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 9.80s
openshift_excluder : Get available excluder version --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 9.19s
openshift_control_plane : Wait for APIs to become available ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 8.86s
openshift_control_plane : Wait for APIs to become available ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 8.75s
openshift_node : update package meta data to speed install later. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 8.26s
openshift_manageiq : Configure role/user permissions -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 8.21s
openshift_sdn : Copy templates to temp directory ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 6.76s
Upgrade all storage ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.95s
Migrate storage post policy reconciliation ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 5.91s
openshift_node : Update journald setup ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.90s
openshift_node_group : remove templated files --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.73s
openshift_node_group : Copy templates to temp directory ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.18s
Run health checks (upgrade) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.06s
Run variable sanity checks ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.92s
openshift_node : Approve the node --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.51s


Failure summary:


  1. Hosts:    inst1.mydomain.com
     Play:     Upgrade Service Catalog
     Task:     Report errors
     Message:  Catalog install failed.
Additional Information

Provide any additional information which may help us diagnose the issue.

  • OS: CentOS Linux release 7.5.1804 (Core)
  • Inventory file:
[OSEv3:children]
masters
nodes
etcd
new_nodes

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root

# If ansible_ssh_user is not root, ansible_become must be set to true
#ansible_become=true

openshift_deployment_type=origin

# uncomment the following to enable htpasswd authentication; defaults to AllowAllPasswordIdentityProvider
#openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
# Defining htpasswd file
#openshift_master_htpasswd_file=/etc/origin/master/htpasswd

# Disable memory availability check
openshift_disable_check=memory_availability

# host group for masters
[masters]
inst1.mydomain.com

# host group for etcd
[etcd]
inst1.mydomain.com

# host group for nodes, includes region info
[nodes]
inst1.mydomain.com openshift_node_group_name='node-config-master-infra'
inst2.mydomain.com openshift_node_group_name='node-config-compute'
inst3.mydomain.com openshift_node_group_name='node-config-compute'

[new_nodes]

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16 (2 by maintainers)

Most upvoted comments

I dug around a bit but couldn’t quite figure out why my node-config.yaml wasn’t being regenerated during the upgrade.

I was able to workaround this by stopping & starting the relevant services during the Verify that the catalog api server is running play:

systemctl stop origin-node
systemctl stop docker
systemctl start docker
systemctl start origin-node

@vrutkovs thanks for confirming @nagonzalez I tried again to run the upgrade.yml for v3_10 and here is the list of my /etc/origin/node directory before the upgrade:

[root@inst1 openshift-ansible]# ls -la /etc/origin/node/
total 36
drwx------. 5 root root  249  1. Nov 10:46 .
drwx------. 8 root root  125 31. Okt 16:56 ..
-rw-------. 1 root root 5764 31. Okt 16:44 bootstrap.kubeconfig
-rw-------. 1 root root 1467 31. Okt 16:42 bootstrap-node-config.yaml
drwxr-xr-x. 2 root root  212 31. Okt 16:51 certificates
-rw-r--r--. 1 root root 1070 31. Okt 16:46 client-ca.crt
-rw-------. 1 root root 1549  1. Nov 10:46 node-config.yaml
-rw-------. 1 root root 1549 31. Okt 17:08 node-config.yaml.14174.2018-11-01@10:37:34~
-rw-------. 1 root root 5764 31. Okt 16:44 node.kubeconfig
drwxr-xr-x. 2 root root   68  1. Nov 10:42 pods
-rw-------. 1 root root   49  1. Nov 10:37 resolv.conf
drwxr-xr-x. 2 root root   30  1. Nov 10:49 tmp

and after the failing upgrade:

[root@inst1 openshift-ansible]# ls -la /etc/origin/node/
total 28
drwx------. 5 root root  225  5. Nov 15:53 .
drwx------. 8 root root  125 31. Okt 16:56 ..
-rw-------. 1 root root 5764 31. Okt 16:44 bootstrap.kubeconfig
-rw-r--r--. 1 root root 1488  5. Nov 15:53 bootstrap-node-config.yaml
drwxr-xr-x. 2 root root  166  5. Nov 15:53 certificates
-rw-r--r--. 1 root root 1070 31. Okt 16:46 client-ca.crt
-rw-------. 1 root root 1549 31. Okt 17:08 node-config.yaml.14174.2018-11-01@10:37:34~
-rw-------. 1 root root 1942  5. Nov 15:53 node.kubeconfig
drwx------. 2 root root   68  1. Nov 10:42 pods
-rw-------. 1 root root   49  1. Nov 10:37 resolv.conf
drwxr-xr-x. 2 root root   30  5. Nov 15:55 tmp

As you mentioned the node-config.yml file is missing and gets deleted during the upgrade. How could this happen? and how did you fix it? I suspect this is a bug int he upgrade.yml playbook…