jaeger: Span format is not well suited for ES

Jaeger spans, when put in elasticsearch, have the following structure:

{
  "_index": "jaeger-span-2018-07-02",
  "_type": "span",
  "_id": "a3RsXWQBjGb8h888uVuL",
  "_version": 1,
  "_score": null,
  "_source": {
    "traceID": "f229796cda43df60",
    "spanID": "274c0d9d4fc76848",
    "parentSpanID": "f229796cda43df60",
    "flags": 1,
    "operationName": "/",
    "references": [
      {
        "refType": "CHILD_OF",
        "traceID": "f229796cda43df60",
        "spanID": "f229796cda43df60"
      }
    ],
    "startTime": 1530575763255726,
    "duration": 1798,
    "tags": [
      {
        "key": "component",
        "type": "string",
        "value": "nginx"
      },
      {
        "key": "nginx.worker_pid",
        "type": "string",
        "value": "10767"
      },
      {
        "key": "peer.address",
        "type": "string",
        "value": "10.244.4.100:37542"
      },
      {
        "key": "http.method",
        "type": "string",
        "value": "POST"
      },
      {
        "key": "http.url",
        "type": "string",
        "value": "YYY"
      },
      {
        "key": "http.host",
        "type": "string",
        "value": "XXX"
      },
      {
        "key": "http.status_code",
        "type": "int64",
        "value": "204"
      },
      {
        "key": "http.status_line",
        "type": "string",
        "value": "204 No Content"
      }
    ],
    "logs": [],
    "processID": "",
    "process": {
      "serviceName": "ingress-controller",
      "tags": [
        {
          "key": "jaeger.version",
          "type": "string",
          "value": "C++-0.2.0"
        },
        {
          "key": "hostname",
          "type": "string",
          "value": "vega"
        },
        {
          "key": "ip",
          "type": "string",
          "value": "127.0.0.1"
        }
      ]
    },
    "warnings": null,
    "startTimeMillis": 1530575763255
  },
  "fields": {
    "startTimeMillis": [
      "2018-07-02T23:56:03.255Z"
    ]
  },
  "sort": [
    1530575763255
  ]
}

Notice the arrays here. The problem is that we were actually thinking about completely replacing debug logs with debug traces, but because everything is in arrays we can’t index these spans in ES and thus cant really reliably search them. Jaeger is nice, but ES has much richer search capabilities and it would be just great if we could treat spans as regular structured documents we can put in ES and index properly.

Is there any plans to support this use case?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 39 (21 by maintainers)

Most upvoted comments

I also had this issue with kibana. Additionally switching from nested documents to flat schema should help with query performance. Anyway, the main constraint is the limit of fields in ES - index.mapping.total_fields.limit defaults to 1000 and can be increased, but I think I’ve read somewhere that 10k is too much.

My idea would be to create structure like this:

"tags": {
   "component_string": "nginx",
   "nginx_worker_pid_long": 10767,
   "peer_address_string": "10.244.4.100:37542",
   "http_method_string": "POST",
   ...
}
...
"other_fields": [
   "randomtag=1234",
   "otherrandomtag=ajhsdj"
]
  1. there is no need to index field type, it’s sufficient to put it into the field name.
  2. “other_fields” allow to store arbitrary number of key-values and allow querying similar to what is available in Cassandra:
    • equals: use ES terms query on other_field and string otherrandomtag=ajhsdj
    • prefix search: use ES prefix query on other_field and string otherrandomtag=aj
  3. this allows to use long ES type for tags with long values and boolean ES type for tags with boolean values - leading to appropriate and smaller indices.

The keys in tags would be supported nicely in Kibana, while the remaining ones in other_fields would still be query-able if needed.

@Monnoroch In my implementation just simple concat. Using Painless. params._source['tags'].stream().map(item->item['key']).collect(Collectors.joining())

//edit: forgot to mention you

@kacper-jackiewicz Can you please share the field definitions? This was exactly my idea too, but I must admit that I have failed to write correct scripts quickly myself. And this can probably be universally useful for “Jaeger over ES” users.