openshift-ansible: Fresh install or upgrade of logging stack to v3.6.0 === Unknown Discovery type [kubernetes]

Description

Attempting to upgrade our logging stack from v1.5.1 => v3.6.0 succeeds running through Ansible, but the ES containers do not deploy successfully.

I thought it might be a corruption, so I tried after wiping the storage for the ES containers which didn’t make a diff. I then tried to install fresh and same problem occurs. v1.5.1 works fine, but would like to keep the logging aligned with the cluster version.

Not sure if this is the right place to put this or if there’s someone else that maintains the v3.6.0 logging images (if the problem lies there) – Any help would be appreciated.

Version

Your ansible version per ansible --version

ansible 2.3.2.0
  config file = /Users/hef/work/openshift-ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.13 (default, Jul 18 2017, 09:17:00) [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)]

Steps To Reproduce

Upgrade with/without existing data from v1.5.1 to v3.6.0 || Fresh install of v3.6.0

In Ansible repo, git checkout release-3.6
git pull --rebase to update
ansible-playbook playbooks/byo/openshift-cluster/openshift-logging.yml

(also tried on master branch with no luck)

Expected Results

Successful install and/or upgrade the container images in logging project to v3.6.0 + any other changes necessary to rev up to a v3.6.0 cluster.

Observed Results

ES containers do not come up (in crash loop) with the following output:

[2017-09-21 19:09:40,650][INFO ][container.run            ] Begin Elasticsearch startup script
--
  | [2017-09-21 19:09:40,663][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
  | [2017-09-21 19:09:40,664][INFO ][container.run            ] Inspecting the maximum RAM available...
  | [2017-09-21 19:09:40,668][INFO ][container.run            ] ES_HEAP_SIZE: '1024m'
  | [2017-09-21 19:09:40,669][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
  | [2017-09-21 19:09:40,672][INFO ][container.run            ] Checking if Elasticsearch is ready on https://localhost:9200
  | Exception in thread "main" java.lang.IllegalArgumentException: Unknown Discovery type [kubernetes]
  | at org.elasticsearch.discovery.DiscoveryModule.configure(DiscoveryModule.java:100)
  | at <<<guice>>>
  | at org.elasticsearch.node.Node.<init>(Node.java:213)
  | at org.elasticsearch.node.Node.<init>(Node.java:140)
  | at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:143)
  | at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194)
  | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
  | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)
  | Refer to the log for complete error details.

Additional Information

Your operating system and version, ie: RHEL 7.2, Fedora 23 ($ cat /etc/redhat-release) CentOS Linux release 7.3.1611 (Core)
Your inventory file (especially any non-standard configuration parameters) https://gist.github.com/rhefner/f50283b1e01ba1ac8d5208c03b5bc2b7

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 21 (10 by maintainers)

Most upvoted comments

You have an updated opneshift-ansible but old ES image. If you are getting the ES image from https://hub.docker.com/r/openshift/origin-logging-elasticsearch/tags/, it looks like only latest has been updated. I recommend not changing anything in the ES config map and pull the latest ES image.

If you want some background about what is exactly happening or other solution than updating the ES images, read on. In September, we introduced a new type of master discovery algorithm in ES images - by label and port, because discovering by service didn’t work well with readiness probe.

It has relevant changes in:

openshift-ansible - https://github.com/openshift/openshift-ansible/pull/5209

turning back on readiness probe
changing the discovery algorithm in ES configmap

ES image - https://github.com/openshift/origin-aggregated-logging/pull/609

new library supporting the new discovery algorithm

If you don’t want to update the ES image then you need to:

disable readiness probe - oc edit dc logging-es-data-master-... each ES DeploymentConfig and remove part starting readinessProbe:
revert back the master discovery algorithm - oc edit cm logging-elasticsearch and change

cloud:
   kubernetes:
     pod_label: ${POD_LABEL}
     pod_port: 9300
     namespace: ${NAMESPACE}

cloud:
   kubernetes:
     service: ${SERVICE_DNS}
     namespace: ${NAMESPACE}

wozniakjan on Sep 22, 2017

We have introduced and released new tag v3.6. This tag will be updated regularly, will no longer have to wait for release engineers to push a new image. More info here https://github.com/openshift/origin-aggregated-logging/pull/758

This should be working now: openshift_logging_image_version=v3.6

wozniakjan on Nov 9, 2017

@mhutter you can now use v3.6.1 https://hub.docker.com/r/openshift/origin-logging-elasticsearch/tags/

wozniakjan on Oct 31, 2017

@mhutter yes, there will be. you will not be expected to run with the latest tag in production.

ewolinetz on Oct 27, 2017

Having the same issue here. From what I know the elasticsearch.yml discovery.type: kubernetes

should be:

discovery.zen.hosts_provider: kubernetes

I believe this is from a change in the kubernetes plugin

After changing this, I got farther, but have searchguard issues saying it was not initialized.

ttindell2 on Sep 21, 2017