jaeger: Span format is not well suited for ES
Jaeger spans, when put in elasticsearch, have the following structure:
{
"_index": "jaeger-span-2018-07-02",
"_type": "span",
"_id": "a3RsXWQBjGb8h888uVuL",
"_version": 1,
"_score": null,
"_source": {
"traceID": "f229796cda43df60",
"spanID": "274c0d9d4fc76848",
"parentSpanID": "f229796cda43df60",
"flags": 1,
"operationName": "/",
"references": [
{
"refType": "CHILD_OF",
"traceID": "f229796cda43df60",
"spanID": "f229796cda43df60"
}
],
"startTime": 1530575763255726,
"duration": 1798,
"tags": [
{
"key": "component",
"type": "string",
"value": "nginx"
},
{
"key": "nginx.worker_pid",
"type": "string",
"value": "10767"
},
{
"key": "peer.address",
"type": "string",
"value": "10.244.4.100:37542"
},
{
"key": "http.method",
"type": "string",
"value": "POST"
},
{
"key": "http.url",
"type": "string",
"value": "YYY"
},
{
"key": "http.host",
"type": "string",
"value": "XXX"
},
{
"key": "http.status_code",
"type": "int64",
"value": "204"
},
{
"key": "http.status_line",
"type": "string",
"value": "204 No Content"
}
],
"logs": [],
"processID": "",
"process": {
"serviceName": "ingress-controller",
"tags": [
{
"key": "jaeger.version",
"type": "string",
"value": "C++-0.2.0"
},
{
"key": "hostname",
"type": "string",
"value": "vega"
},
{
"key": "ip",
"type": "string",
"value": "127.0.0.1"
}
]
},
"warnings": null,
"startTimeMillis": 1530575763255
},
"fields": {
"startTimeMillis": [
"2018-07-02T23:56:03.255Z"
]
},
"sort": [
1530575763255
]
}
Notice the arrays here. The problem is that we were actually thinking about completely replacing debug logs with debug traces, but because everything is in arrays we can’t index these spans in ES and thus cant really reliably search them. Jaeger is nice, but ES has much richer search capabilities and it would be just great if we could treat spans as regular structured documents we can put in ES and index properly.
Is there any plans to support this use case?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 39 (21 by maintainers)
I also had this issue with kibana. Additionally switching from nested documents to flat schema should help with query performance. Anyway, the main constraint is the limit of fields in ES -
index.mapping.total_fields.limitdefaults to 1000 and can be increased, but I think I’ve read somewhere that 10k is too much.My idea would be to create structure like this:
other_fieldand stringotherrandomtag=ajhsdjother_fieldand stringotherrandomtag=ajThe keys in
tagswould be supported nicely in Kibana, while the remaining ones inother_fieldswould still be query-able if needed.@Monnoroch In my implementation just simple concat. Using Painless.
params._source['tags'].stream().map(item->item['key']).collect(Collectors.joining())//edit: forgot to mention you
@kacper-jackiewicz Can you please share the field definitions? This was exactly my idea too, but I must admit that I have failed to write correct scripts quickly myself. And this can probably be universally useful for “Jaeger over ES” users.