jaeger-operator: failed to create primary Elasticsearch client with health check timeout - no Elasticsearch node available

Hi,

I am trying to setup Elasticsearch storage with Jaeger operator, but it keeps failing for the initial healthcheck.

Jaeger operator version - v1.12.1 K8s version - v1.12.8 Elasticsearch operator version - v0.8.1 Elasticsearch version - v7.2.0

The operators have been deployed and running. When the Jaeger instance is deployed with es backend, the collector / query pods fails with :

$ kubectl logs my-jaeger-collector-8477b87bfd-sdcq4
{"level":"info","ts":1562160343.721835,"caller":"flags/service.go:113","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1562160343.7220967,"caller":"flags/admin.go:108","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1562160343.7221494,"caller":"flags/admin.go:114","msg":"Starting admin HTTP server","http-port":14269}
{"level":"info","ts":1562160343.7221606,"caller":"flags/admin.go:100","msg":"Admin server started","http-port":14269,"health-status":"unavailable"}
{"level":"fatal","ts":1562160348.7503304,"caller":"collector/main.go:87","msg":"Failed to init storage factory","error":"failed to create primary Elasticsearch client: health check timeout: Head http://quickstart-es:9200: EOF: no Elasticsearch node available","errorVerbose":"no Elasticsearch node available\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.init.ializers\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:88\nruntime.main\n\t/home/travis/.gimme/versions/go1.12.1.linux.amd64/src/runtime/proc.go:188\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.12.1.linux.amd64/src/runtime/asm_amd64.s:1337\nhealth check timeout: Head http://quickstart-es:9200: EOF
$ kubectl get po
NAME                                  READY     STATUS             RESTARTS   AGE
my-jaeger-agent-daemonset-pxbnn       1/1       Running            0          5m
my-jaeger-agent-daemonset-v4x47       1/1       Running            0          5m
my-jaeger-agent-daemonset-v8fc9       1/1       Running            0          5m
my-jaeger-collector-f9d9656db-bwcb6   0/1       CrashLoopBackOff   5          5m
my-jaeger-query-74576c764-fxj7h       1/2       CrashLoopBackOff   5          5m

Jaeger operator logs :

$ kubectl logs jaeger-operator-5ddcb7c446-ksj9m -n observability
time="2019-07-03T12:01:17Z" level=info msg=Versions arch=amd64 jaeger-operator=1.12.1 operator-sdk=v0.8.1 os=linux version=go1.12.5
time="2019-07-03T12:01:18Z" level=info msg="Auto-detected the platform" platform=kubernetes
time="2019-07-03T12:01:18Z" level=info msg="Automatically adjusted the 'es-provision' flag" es-provision=false
time="2019-07-03T13:14:22Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2019-07-03 13:09:22.662415944 +0000 UTC" instance=my-jaeger namespace=default
time="2019-07-03T13:19:23Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2019-07-03 13:14:23.743258079 +0000 UTC" instance=my-jaeger namespace=default
time="2019-07-03T13:24:24Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2019-07-03 13:19:24.779557298 +0000 UTC" instance=my-jaeger namespace=default

My spec file looks like :

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: my-jaeger
spec:
  strategy: production
  ui:
    options:
      dependencies:
        menuEnabled: false
      tracking:
        gaID: UA-000000-2
      menu:
        - label: "About Jaeger"
          items:
            - label: "Documentation"
              url: "https://www.jaegertracing.io/docs/latest"
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://quickstart-es:9200
    secretName: mysecret
  ingress:
    enabled: false
  agent:
    strategy: DaemonSet
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""

ES svc has been created and running :

$ kubectl get svc quickstart-es
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
quickstart-es   ClusterIP   100.66.174.252   <none>        9200/TCP   4h

Elasticsearch cluster is in the same k8s cluster and namespace (I think) :

$ curl -u "elastic:$PASSWORD" -k "https://localhost:9200"
{
  "name" : "quickstart-es-96dchsxw4f",
  "cluster_name" : "quickstart",
  "cluster_uuid" : "someuuid",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

I have even tried to give the full DNS in jaeger config like http://quickstart-es.default.svc.cluster.local:9200 and even the clusterIP.

My secret file looks like (Its been generated using $(kubectl get secret quickstart-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode) and username elastic) -

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
type: Opaque
data:
  ES_USERNAME: ZWxhc3RpYw==
  ES_PASSWORD: somepasswordinbase64

There are some more error logs in es cluster pod :

{"type": "server", "timestamp": "2019-07-03T13:47:18,707+0000", "level": "WARN", "component": "o.e.h.AbstractHttpServerTransport", "cluster.name": "quickstart", "node.name": "quickstart-es-96dchsxw4f", "cluster.uuid": "someuuid", "node.id": "someid",  "message": "caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=0.0.0.0/0.0.0.0:9200, remoteAddress=/xx.xx.x.xx:51134}" ,
"stacktrace": ["io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 48454144202f2048545450....",

Can anyone please help with this case ? Thanks.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 37 (21 by maintainers)

Most upvoted comments

@vishnuhd Here is a more simplified deployment manifests:

ElasticSearch:

apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: quickstart
  namespace: observability
spec:
  version: 6.8.0
  nodes:
  - nodeCount: 1
    config:
      node.master: true
      node.data: true
      node.ingest: true

Kibana:

apiVersion: kibana.k8s.elastic.co/v1alpha1
kind: Kibana
metadata:
  name: quickstart
  namespace: observability
spec:
  version: 6.8.0
  nodeCount: 1
  elasticsearchRef:
    name: quickstart
  http:
    tls:
      selfSignedCertificate:
        disabled: true

Jaeger:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: local-jaeger-tracing
  namespace: observability
spec:
  strategy: allInOne
  allInOne:
    image: docker.io/jaegertracing/all-in-one:1.13
    options:
      collector:
        zipkin:
          http-port: "9411"
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: https://quickstart-es-http.observability.svc:9200
        tls.ca: /etc/ssl/certs/tls.crt
    secretName: quickstart-es
  ingress:
    enabled: false
  agent:
    strategy: Sidecar
  volumeMounts:
  - name: es-tls
    mountPath: /etc/ssl/certs
  volumes:
  - name: es-tls
    secret:
      secretName: quickstart-es-http-certs-public

Example of quickstart-es secret manifest generated manually:

apiVersion: v1
kind: Secret
metadata:
  name: quickstart-es
  namespace: observability
type: Opaque
data:
  ES_USERNAME: ZWxhc3RpYw==
  ES_PASSWORD: Nm5qa21rbnZ0Z3AyZnZsaHF3OWtoN3J4 # 6njkmknvtgp2fvlhqw9kh7rx

@vishnuhd I finally figured out how it works! I have installed delve inside the jaeger-collector and I did a remote debugging session in order to understand what should be the right configurations.

Here is my jaeger custom resource manifest:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: local-jaeger-tracing
  namespace: observability
spec:
  strategy: allInOne
  allInOne:
    image: docker.io/jaegertracing/all-in-one:1.13
    options:
      collector:
        zipkin:
          http-port: "9411"
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: https://quickstart-es-http.observability.svc.cluster.local:9200
        tls.ca: /etc/ssl/certs/tls.crt
    secretName: quickstart-es
  ingress:
    enabled: false
  agent:
    strategy: Sidecar
  volumeMounts:
  - name: es-tls
    mountPath: /etc/ssl/certs
  volumes:
  - name: es-tls
    secret:
      secretName: quickstart-es-http-certs-public

NOTE: I have also successfully ran Jaeger with the production strategy.

I am using the ElasticSearch Operator from elastic in the master branch (probably the future version v0.9.0). The ElasticSearch Operator generates the secret resource named quickstart-es-http-certs-public.

image