openshift-ansible: Fresh install or upgrade of logging stack to v3.6.0 === Unknown Discovery type [kubernetes]

Description

Attempting to upgrade our logging stack from v1.5.1 => v3.6.0 succeeds running through Ansible, but the ES containers do not deploy successfully.

I thought it might be a corruption, so I tried after wiping the storage for the ES containers which didn’t make a diff. I then tried to install fresh and same problem occurs. v1.5.1 works fine, but would like to keep the logging aligned with the cluster version.

Not sure if this is the right place to put this or if there’s someone else that maintains the v3.6.0 logging images (if the problem lies there) – Any help would be appreciated.

Version
  • Your ansible version per ansible --version
ansible 2.3.2.0
  config file = /Users/hef/work/openshift-ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.13 (default, Jul 18 2017, 09:17:00) [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)]
Steps To Reproduce

Upgrade with/without existing data from v1.5.1 to v3.6.0 || Fresh install of v3.6.0

  1. In Ansible repo, git checkout release-3.6
  2. git pull --rebase to update
  3. ansible-playbook playbooks/byo/openshift-cluster/openshift-logging.yml

(also tried on master branch with no luck)

Expected Results

Successful install and/or upgrade the container images in logging project to v3.6.0 + any other changes necessary to rev up to a v3.6.0 cluster.

Observed Results

ES containers do not come up (in crash loop) with the following output:

[2017-09-21 19:09:40,650][INFO ][container.run            ] Begin Elasticsearch startup script
--
  | [2017-09-21 19:09:40,663][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
  | [2017-09-21 19:09:40,664][INFO ][container.run            ] Inspecting the maximum RAM available...
  | [2017-09-21 19:09:40,668][INFO ][container.run            ] ES_HEAP_SIZE: '1024m'
  | [2017-09-21 19:09:40,669][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
  | [2017-09-21 19:09:40,672][INFO ][container.run            ] Checking if Elasticsearch is ready on https://localhost:9200
  | Exception in thread "main" java.lang.IllegalArgumentException: Unknown Discovery type [kubernetes]
  | at org.elasticsearch.discovery.DiscoveryModule.configure(DiscoveryModule.java:100)
  | at <<<guice>>>
  | at org.elasticsearch.node.Node.<init>(Node.java:213)
  | at org.elasticsearch.node.Node.<init>(Node.java:140)
  | at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:143)
  | at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194)
  | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
  | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)
  | Refer to the log for complete error details.
Additional Information

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

You have an updated opneshift-ansible but old ES image. If you are getting the ES image from https://hub.docker.com/r/openshift/origin-logging-elasticsearch/tags/, it looks like only latest has been updated. I recommend not changing anything in the ES config map and pull the latest ES image.

If you want some background about what is exactly happening or other solution than updating the ES images, read on. In September, we introduced a new type of master discovery algorithm in ES images - by label and port, because discovering by service didn’t work well with readiness probe.

It has relevant changes in:

  1. openshift-ansible - https://github.com/openshift/openshift-ansible/pull/5209
  • turning back on readiness probe
  • changing the discovery algorithm in ES configmap
  1. ES image - https://github.com/openshift/origin-aggregated-logging/pull/609
  • new library supporting the new discovery algorithm

If you don’t want to update the ES image then you need to:

  • disable readiness probe - oc edit dc logging-es-data-master-... each ES DeploymentConfig and remove part starting readinessProbe:
  • revert back the master discovery algorithm - oc edit cm logging-elasticsearch and change
cloud:
   kubernetes:
     pod_label: ${POD_LABEL}
     pod_port: 9300
     namespace: ${NAMESPACE} 

to

cloud:
   kubernetes:
     service: ${SERVICE_DNS}
     namespace: ${NAMESPACE} 

We have introduced and released new tag v3.6. This tag will be updated regularly, will no longer have to wait for release engineers to push a new image. More info here https://github.com/openshift/origin-aggregated-logging/pull/758

This should be working now: openshift_logging_image_version=v3.6

@mhutter yes, there will be. you will not be expected to run with the latest tag in production.

Having the same issue here. From what I know the elasticsearch.yml discovery.type: kubernetes

should be:

discovery.zen.hosts_provider: kubernetes

I believe this is from a change in the kubernetes plugin

After changing this, I got farther, but have searchguard issues saying it was not initialized.