openshift-ansible: Glusterfs config.yml deploy produce: Wait for heketi pod Message Failed without returning a message

Hi Everyone, I’m trying to overcome this issue in my deployment for around a week, I ran the deployment scripts prerequisites.yml, deploy_cluster.yml, uninstall.yml, again and again still same issue.

* ansible 2.9.2
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  7 2019, 00:51:29) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

* The output of `git describe`
`git version 1.8.3.1`
If you're running from playbooks installed via RPM

* The output of `rpm -q openshift-ansible`
package openshift-ansible is not installed (but I'm sure it is!)
[root@os-master openshift-ansible]# ls
ansible.cfg  CONTRIBUTING.md      examples  images     meta                    playbooks                  README_CONTAINERIZED_INSTALLATION.md  roles      test
BUILD.md     DEPLOYMENT_TYPES.md  hack      inventory  openshift-ansible.spec  pytest.ini                 README.md                             setup.cfg  test-requirements.txt
conftest.py  docs                 HOOKS.md  LICENSE    OWNERS                  README_CONTAINER_IMAGE.md  requirements.txt                      setup.py   tox.ini
Place the output between the code block below:

Steps To Reproduce

Run ansible-playbook /root/openshift-ansible/playbooks/prerequisites.yml
Deploy new single master cluster using ansible-playbook /root/openshift-ansible/playbooks/openshift-glusterfs/config.yml
Same happen also with ansible-playbook /root/openshift-ansible/playbooks/deploy_cluster.yml 100% reproducible, even wit a minimal inventory file and new VMs.

Expected Results

GlusterFS is deployed and ready for PV and PVC

Observed Results

Describe what is actually happening.

PLAY RECAP **************************************************************************************************************************************************************************************** localhost : ok=12 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0 os-infra.mydomain.com : ok=144 changed=36 unreachable=0 failed=0 skipped=163 rescued=0 ignored=0 os-master.mydomain.com : ok=471 changed=193 unreachable=0 failed=1 skipped=589 rescued=0 ignored=0 os-node.mydomain.com : ok=129 changed=36 unreachable=0 failed=0 skipped=159 rescued=0 ignored=0 os-storage.mydomain.com : ok=129 changed=36 unreachable=0 failed=0 skipped=159 rescued=0 ignored=0

INSTALLER STATUS ********************************************************************************************************************************************************************************** Initialization : Complete (0:00:26) Health Check : Complete (0:00:07) Node Bootstrap Preparation : Complete (0:03:14) etcd Install : Complete (0:00:41) Master Install : Complete (0:04:26) Master Additional Install : Complete (0:00:40) Node Join : Complete (0:00:43) GlusterFS Install : In Progress (0:13:46) This phase can be restarted by running: playbooks/openshift-glusterfs/new_install.yml

Failure summary:

Hosts: os-master.mydomain.com Play: Configure GlusterFS Task: Wait for heketi pod Message: Failed without returning a message.`

Always when deploying Glusterfs it failed Failed without returning a message and after Wait for heketi pod FAILED - RETRYING: Wait for heketi pod (1 retries left).

fatal: [os-master.example.comthedawn.com]: FAILED! => {"attempts": 30, "changed": false, "module_results": {"cmd": "/usr/bin/oc get pod --selector=glusterfs=heketi-storage-pod -o json -n glusterfs", "results": [{"apiVersion": "v1", "items": [{"apiVersion": "v1", "kind": "Pod", "metadata": {"annotations": {"openshift.io/deployment-config.latest-version": "1", "openshift.io/deployment-config.name": "heketi-storage", "openshift.io/deployment.name": "heketi-storage-1", "openshift.io/scc": "privileged"}, "creationTimestamp": "2020-01-24T11:26:08Z", "generateName": "heketi-storage-1-", "labels": {"deployment": "heketi-storage-1", "deploymentconfig": "heketi-storage", "glusterfs": "heketi-storage-pod", "heketi": "storage-pod"}, "name": "heketi-storage-1-bvn52", "namespace": "glusterfs", "ownerReferences": [{"apiVersion": "v1", "blockOwnerDeletion": true, "controller": true, "kind": "ReplicationController", "name": "heketi-storage-1", "uid": "4fa4ac45-3e9c-11ea-a06a-000c29fad897"}], "resourceVersion": "8520", "selfLink": "/api/v1/namespaces/glusterfs/pods/heketi-storage-1-bvn52", "uid": "51645ba4-3e9c-11ea-a06a-000c29fad897"}, "spec": {"containers": [{"env": [{"name": "HEKETI_USER_KEY", "value": "kumZhoQMqSxUcCGAeIXiiZfYSnxOIQrQkGp2T1ev6AM="}, {"name": "HEKETI_ADMIN_KEY", "value": "Wcct2uvr6AI8bXUtFIJ9IdgJxyqdW+P1qKWSntk9MCg="}, {"name": "HEKETI_CLI_USER", "value": "admin"}, {"name": "HEKETI_CLI_KEY", "value": "Wcct2uvr6AI8bXUtFIJ9IdgJxyqdW+P1qKWSntk9MCg="}, {"name": "HEKETI_EXECUTOR", "value": "kubernetes"}, {"name": "HEKETI_FSTAB", "value": "/var/lib/heketi/fstab"}, {"name": "HEKETI_SNAPSHOT_LIMIT", "value": "14"}, {"name": "HEKETI_KUBE_GLUSTER_DAEMONSET", "value": "1"}, {"name": "HEKETI_IGNORE_STALE_OPERATIONS", "value": "true"}, {"name": "HEKETI_DEBUG_UMOUNT_FAILURES", "value": "true"}], "image": "docker.io/heketi/heketi:latest", "imagePullPolicy": "IfNotPresent", "livenessProbe": {"failureThreshold": 3, "httpGet": {"path": "/hello", "port": 8080, "scheme": "HTTP"}, "initialDelaySeconds": 30, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 3}, "name": "heketi", "ports": [{"containerPort": 8080, "protocol": "TCP"}], "readinessProbe": {"failureThreshold": 3, "httpGet": {"path": "/hello", "port": 8080, "scheme": "HTTP"}, "initialDelaySeconds": 3, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 3}, "resources": {}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [{"mountPath": "/var/lib/heketi", "name": "db"}, {"mountPath": "/etc/heketi", "name": "config"}, {"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "heketi-storage-service-account-token-kmvc8", "readOnly": true}]}], "dnsPolicy": "ClusterFirst", "imagePullSecrets": [{"name": "heketi-storage-service-account-dockercfg-xf57n"}], "nodeName": "os-master.example.comthedawn.com", "priority": 0, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "serviceAccount": "heketi-storage-service-account", "serviceAccountName": "heketi-storage-service-account", "terminationGracePeriodSeconds": 30, "volumes": [{"glusterfs": {"endpoints": "heketi-db-storage-endpoints", "path": "heketidbstorage"}, "name": "db"}, {"name": "config", "secret": {"defaultMode": 420, "secretName": "heketi-storage-config-secret"}}, {"name": "heketi-storage-service-account-token-kmvc8", "secret": {"defaultMode": 420, "secretName": "heketi-storage-service-account-token-kmvc8"}}]}, "status": {"conditions": [{"lastProbeTime": null, "lastTransitionTime": "2020-01-24T11:26:08Z", "status": "True", "type": "Initialized"}, {"lastProbeTime": null, "lastTransitionTime": "2020-01-24T11:26:08Z", "message": "containers with unready status: [heketi]", "reason": "ContainersNotReady", "status": "False", "type": "Ready"}, {"lastProbeTime": null, "lastTransitionTime": null, "message": "containers with unready status: [heketi]", "reason": "ContainersNotReady", "status": "False", "type": "ContainersReady"}, {"lastProbeTime": null, "lastTransitionTime": "2020-01-24T11:26:08Z", "status": "True", "type": "PodScheduled"}], "containerStatuses": [{"image": "docker.io/heketi/heketi:latest", "imageID": "", "lastState": {}, "name": "heketi", "ready": false, "restartCount": 0, "state": {"waiting": {"reason": "ContainerCreating"}}}], "hostIP": "192.168.1.212", "phase": "Pending", "qosClass": "BestEffort", "startTime": "2020-01-24T11:26:08Z"}}], "kind": "List", "metadata": {"resourceVersion": "", "selfLink": ""}}], "returncode": 0}, "state": "list"}

For long output or logs, consider using a gist

Additional Information

I believe it’s related to this bug, but Maybe I’m missing the is the workaround?


I'm using as standalone ESXi VMware as the hypervisor, and an RPM install of Origin.
ansible 2.9.2
Origin 3.11
Centos 7 as the OS for the nodes

`[root@os-master ~]# docker version
Client:
Version: 1.13.1
API version: 1.26
Package version: docker-1.13.1-103.git7f2769b.el7.centos.x86_64
Go version: go1.10.3
Git commit: 7f2769b/1.13.1
Built: Sun Sep 15 14:06:47 2019
OS/Arch: linux/amd64

Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Package version: docker-1.13.1-103.git7f2769b.el7.centos.x86_64
Go version: go1.10.3
Git commit: 7f2769b/1.13.1
Built: Sun Sep 15 14:06:47 2019
OS/Arch: linux/amd64
Experimental: false`

Here is my inventory:
`[OSEv3:children]
masters
etcd
nodes
glusterfs


[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=origin
openshift_release="3.11"
openshift_image_tag="v3.11"
openshift_master_default_subdomain=apps.mydomain.com
openshift_docker_selinux_enabled=True
openshift_check_min_host_memory_gb=16
openshift_check_min_host_disk_gb=50
openshift_disable_check=docker_image_availability
openshift_master_dynamic_provisioning_enabled=true
openshift_registry_selector="role=infra"
openshift_hosted_registry_storage_kind=glusterfs

openshift_metrics_install_metrics=true
openshift_metrics_cassandra_storage_type=pv
openshift_metrics_hawkular_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_cassandra_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_heapster_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_storage_volume_size=20Gi
openshift_metrics_cassandra_pvc_storage_class_name="glusterfs-registry-block"

openshift_logging_install_logging=true
openshift_logging_es_pvc_dynamic=true openshift_logging_storage_kind=dynamic
openshift_logging_kibana_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_curator_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_es_pvc_size=20Gi
openshift_logging_es_pvc_storage_class_name="glusterfs-registry-block"

openshift_storage_glusterfs_registry_namespace=infra-storage
openshift_storage_glusterfs_registry_storageclass=false
openshift_storage_glusterfs_registry_storageclass_default=false
openshift_storage_glusterfs_registry_block_deploy=true
openshift_storage_glusterfs_registry_block_host_vol_create=true
openshift_storage_glusterfs_registry_block_host_vol_size=100
openshift_storage_glusterfs_registry_block_storageclass=true
openshift_storage_glusterfs_registry_block_storageclass_default=false

[masters]
os-master.mydomain.com

[etcd]
os-master.mydomain.com

[nodes]
os-master.mydomain.com openshift_node_group_name="node-config-master"
os-infra.mydomain.com openshift_node_group_name="node-config-infra"
os-storage.mydomain.com openshift_node_group_name="node-config-compute"
os-node.mydomain.com openshift_node_group_name="node-config-compute"


[glusterfs]
os-infra.mydomain.com glusterfs_ip='192.168.1.213' glusterfs_devices='["/dev/sdb"]'
os-node.mydomain.com glusterfs_ip='192.168.1.214' glusterfs_devices='["/dev/sdb"]'
os-storage.mydomain.com glusterfs_ip='192.168.1.215' glusterfs_devices='["/dev/sdb"]'`


Can someone please advice what should I do to be able to successfully deploy?
Many Thanks on advance.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 35

Most upvoted comments

Before installing the glusterfs 6 or 7 libs you want to uninstall glusterfs3:

yum remove glusterfs-fuse glusterfs-libs

kanadaj on Feb 1, 2020

You need GlusterFS installed manually on ALL hosts I believe.

kanadaj on Jan 31, 2020

retry by adding below line

openshift_storage_glusterfs_is_native=false

imranrazakhan on Jan 29, 2020

I have below version and its working fine

# glusterfs --version
glusterfs 6.1

imranrazakhan on Jan 28, 2020