openshift-ansible: Metrics installation using playbook does not end up with a working installation (3.7)

Description

On a new install on a 3.7 cluster the metrics playbook succesfully completes.

But when i go check the openshift-infra project the hawkular pods will not start.

Version

Please put the following version information in the code block indicated below.

  • Your ansible version per ansible --version
ansible 2.4.2.0
  config file = /home/ansibleuser/openshift-ansible/ansible.cfg
  configured module search path = [u'/home/ansibleuser/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]

If you’re operating from a git clone:

  • The output of git describe openshift-ansible-3.7.42-1-34-g2474d22
Steps To Reproduce
  1. launch 3.7 playbook ansible-playbook -i ./hosts/cluster-installation playbooks/byo/openshift-cluster/openshift-metrics.yml
Expected Results

Cluster up and running and metrics configured

Observed Results

the hawkular pods show these logs :

2018-04-10 16:14:58,460 INFO  [sun.misc.Version] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-04-10 16:14:58,460 INFO  [sun.misc.Version] (metricsservice-lifecycle-thread) Trying again in 10000 ms

I guess something went wrong on the cassandra level, but i’m not familiar with that database. It just seem to me that some initialisation process did not occur and the cassandra pod is just started without some minimal configuration. I suppose i could manually create it but i don’t wanna mess with things i don’t quite understand.

Here are the cassandra pod boot logs :

https://gist.github.com/ahmadou/9b35d5e534d451555e0a11ac2cd93ce0

Additional Information

Provide any additional information which may help us diagnose the issue. CentOS Linux release 7.4.1708

My config file

#Configuration globale cluster
[OSEv3:children]
masters
etcd
nodes
glusterfs
glusterfs_registry

#VARIABLES GLOBALES CLUSTER
[OSEv3:vars]
#etcd
openshift_use_etcd_system_container=True

#ansible
ansible_ssh_user=ansibleuser
ansible_become=true
ansible_service_broker_image_prefix=openshift/
ansible_service_broker_registry_url="registry.access.redhat.com"

#checks disk
openshift_check_min_host_disk_gb=13
#firewall
os_firewall_use_firewalld=True

#deployment configuration
openshift_deployment_type=origin
#openshift_version=3.9.0
#openshift_pkg_version=3.7.1
#containerized=true

#configuration glusterfs
openshift_storage_glusterfs_namespace=glusterfs
openshift_storage_glusterfs_name=storage

#configuration registry interne
openshift_hosted_registry_storage_kind=glusterfs
openshift_registry_selector="region=infranodes"
openshift_hosted_registry_replicas=3
openshift_hosted_registry_storage_volume_size=190Gi

#configuration routers
openshift_router_selector="region=routingnodes"

#configuration noeuds standard
osm_default_node_selector="region=standardnodes"

#configuration points d'acces master et api
openshift_master_cluster_hostname=master-lb.mycompany.internal
openshift_master_cluster_public_hostname=console.mycompany.com
openshift_master_default_subdomain=mycompany.com
openshift_master_api_port=8443
openshift_master_console_port=8443
openshift_master_session_name=ssn
openshift_public_ip="xx.xx.xx.xx"

#configuration du certificats des routeurs
openshift_hosted_router_certificate={"certfile": "/home/ansibleuser/openshift-ansible/customCertificates/STAR_mycompany.crt", "keyfile": "/home/ansibleuser/openshift-ansible/customCertificates/mycompany.key", "cafile": "/home/ansibleuser/openshift-ansible/customCertificates/COMODORSADomainValidationSecureServerCA.crt"}

#configuration du ldap
openshift_master_identity_providers=[{'name': 'picv4_ldap', 'challenge': 'true', 'login': 'true', 'kind': 'LDAPPasswordIdentityProvider', 'attributes': {'id': ['dn'], 'email': ['mail'], 'name': ['cn'], 'preferredUsername': ['uid']}, 'bindDN': 'uid=ldapbind,cn=users,cn=accounts,dc=ggd,dc=mycompany', 'bindPassword': 'tetetetetetge', 'ca': '', 'insecure': 'true', 'url': 'ldap://ldap.picv4.mycompany:389/cn=users,cn=accounts,dc=picv4,dc=mycompany?uid'}]

#configuration de la politique d'audit
openshift_master_audit_config={"enabled": true, "auditFilePath": "/var/log/openpaas-oscp-audit/openpaas-oscp-audit.log", "maximumFileRetentionDays": 14, "maximumFileSizeMegabytes": 500, "maximumRetainedFiles": 5}

#configuration logs cluster
openshift_logging_install_logging="true"
openshift_logging_es_pvc_dynamic="true"
openshift_logging_es_pvc_size="100G"
openshift_logging_curator_default_days="2"
openshift_logging_curator_run_hour="24"
openshift_master_logging_public_url="https://logs.mycompany.com"

openshift_logging_es_nodeselector="region=infranodes"
openshift_logging_kibana_ops_nodeselector="region=infranodes"
openshift_logging_curator_ops_nodeselector="region=infranodes"

#configuration metrics

openshift_metrics_master_url="https://master.xxxxxx:8443"
openshift_metrics_install_metrics="true"
openshift_metrics_cassandra_storage_type="dynamic"
openshift_metrics_duration=7
openshift_metrics_cassandra_pvc_size="20G"
openshift_metrics_cassandra_replicas=1
openshift_metrics_cassandra_limits_memory="2Gi"
openshift_metrics_cassandra_limits_cpu="2000m"
openshift_metrics_cassandra_nodeselector="{'region':'infra'}"
openshift_master_metrics_public_url="metrics.xxxxxx.com"
openshift_metrics_hawkular_hostname="hawkular.xxxxx.com"
openshift_metrics_hawkular_nodeselector="{'region':'infra'}"
openshift_metrics_cassandra_replicas=1
openshift_metrics_heapster_limits_cpu="2000m"
openshift_metrics_heapster_nodeselector="{'region':'infra'}"
openshift_metrics_hawkular_ca="/home/ansibleuser/openshift-ansible/customCertificates/xxxx.crt"
openshift_metrics_hawkular_cert="/home/ansibleuser/openshift-ansible/customCertificates/xxx-xxxx.crt"
openshift_metrics_hawkular_key="/home/ansibleuser/openshift-ansible/customCertificates/xxX.key"

#NOEUDS GLUSTER FS 
[glusterfs]
storage01.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.31
storage02.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.32
storage03.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.33
storage04.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.34
#config glusterfs
[glusterfs:vars]
openshift_storage_glusterfs_nodeselector="glusterfs=standardstorage"
openshift_storage_glusterfs_wipe="true"

#NOEUDS GLUSTER FS DEDIES  AU REGISTRY INTERNE
[glusterfs_registry]
storage-registry01.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.41
storage-registry02.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.42
storage-registry03.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.43

#NOEUDS DU CLUSTER

#Groupe des VMS Master
[masters]
master0[1:2].mycompany.internal

#noeuds etcd
[etcd]
etcd01.mycompany.internal
etcd02.mycompany.internal
etcd03.mycompany.internal

# Noeuds Openshift
[nodes]

#Infra Nodes
infranode0[1:2].mycompany.internal openshift_node_labels="{'region' : 'infranodes'}" openshift_schedulable=true

#Pic nodes
picnode0[1:2].mycompany.internal openshift_node_labels="{'region' : 'picnodes'}" openshift_schedulable=true

#Compilation nodes
compilnode0[1:2].mycompany.internal openshift_node_labels="{'region' : 'compilnodes'}" openshift_schedulable=true

#routing nodes
routeur0[1:2].mycompany.internal openshift_node_labels="{'region' : 'routingnodes'}"

#standard nodes
node0[1:2].mycompany.internal openshift_node_labels="{'region' : 'standardnodes'}" openshift_schedulable=true

#masters
master0[1:2].mycompany.internal openshift_node_labels="{'region' : 'masters'}" openshift_schedulable=true

#glusterfs nodes
storage0[1:4].mycompany.internal openshift_node_labels="{'region' : 'standardstorage'}"

#glusterfs registry nodes
storage-registry0[1:3].mycompany.internal openshift_node_labels="{'region' : 'registrystorage'}"

#variables specifiques noeuds openshift
[nodes:vars]
openshift_docker_options=--log-driver json-file --log-opt max-size=1M --log-opt max-file=3 --selinux-enabled

EXTRA INFORMATION GOES HERE

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 3
  • Comments: 23 (3 by maintainers)

Most upvoted comments

I’ve seen this issue a couple of times, this is usually triggered when the cassandra pod doesn’t have a persistent volume so the initial setup gets lost when the pod is restarted, in order to fix it is necessary to run the job that comes with the installation, this will create some necessary data structures inside cassandra, use the following commands to accomplish with this task

Export current job yaml

# oc project openshift-infra
# oc get --export job hawkular-metrics-schema -o yaml > job.yaml

Delete old Job

# oc delete job hawkular-metrics-schema

Scale down hawkular metrics

 oc scale rc hawkular-metrics --replicas=0

Create new job instance

# oc create -f job.yml
# oc get job

After the job success, scale up hawkular metrics

 oc scale rc hawkular-metrics --replicas=1

This should to fix the issue related to the missing Keyspace hawkular_metrics

You can handle it in a more proper way adding the following line to your hosts file openshift_metrics_image_version=v3.7.1

I just got it working from changing the hawkular-metrics docker image version from docker.io/openshift/origin-metrics-hawkular-metrics:latest to docker.io/openshift/origin-metrics-hawkular-metrics:v3.7.1. I saw the latest image was pushed 17 days ago. I tried the this version specific and everything works fine. I suspect something with the new image. Please try hope It works for you all too.

Edited: hawkular-metrics yaml Changed : docker.io/openshift/origin-metrics-hawkular-metrics:latest to docker.io/openshift/origin-metrics-hawkular-metrics:v3.7.1