kuma: MeshAccessLog - add support for timeout and retry for TCP backend

Description

Hi, When sending Kuma logs via DataDog service in K8s we get sporadically error messages from kuma-sidekar with failures to send logs. It would be nice to add support for configuring timeout & nr. of retries in case of TCP connection failure.

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Reactions: 3
  • Comments: 18 (6 by maintainers)

Most upvoted comments

Hey, I did some investigation on the “body” and our docs are lacking, or rather examples.

Our docs says

Body is a raw string or an OTLP any value as described at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#field-body It can contain placeholders available on https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/accesslog/usage#command-operators_

This body field is field from OTEL and it’s quite complex. If you want to do key value you can do this

apiVersion: kuma.io/v1alpha1
kind: MeshAccessLog
metadata:
  name: default
  namespace: kuma-system
spec:
  targetRef:
    kind: Mesh
  from:
    - targetRef:
        kind: Mesh
      default:
        backends:
          - type: OpenTelemetry
            openTelemetry:
              endpoint: otel-collector.observability.svc:4317
              body:
                kvlistValue:
                  values:
                  - key: "mesh"
                    value:
                      stringValue: "%KUMA_MESH%"

KUMA_MESH is then interpolated just fine.

We should also do a better job at failing when the format is not right instead of fallback to a default. I agree that it’s confusing.

When it comes to retries. I think grpc_stream_retry_policy could help. Let my try to compose MeshProxyPatch to configure this. If this becomes useful we could incorporate this into MeshAccessLog policy

OpenTelemetry can also be used for logs.

Here we support this in MeshAccessLog https://kuma.io/docs/2.5.x/policies/meshaccesslog/#opentelemetry

And it seems that Datadog agent now has a builtin support for it https://www.datadoghq.com/blog/agent-otlp-log-ingestion/

@jakubdyszkiewicz JFYI - example

      default:
        backends:
          - type: OpenTelemetry
            openTelemetry:
              endpoint: otel-collector.observability.svc:4317
              body:
                kvlistValue:
                  values:
                  - key: "mesh"
                    value:
                      stringValue: "%KUMA_MESH%"

works as expected and DataDog is happy with it and indexing the mesh field. I think it would be good to update the documentation with it as it is very confusing atm.

@jakubdyszkiewicz I was able to get OTLP logs working, however, it also has own problems:

  1. It is not documented that openmetric logs are using GRPC, was able to find it from the source code and rendered envoy config. This should be mentioned in the doc, as OTLP supports both.
  2. Field openTelemetry.attributes fields do not accept %KUMA_*% placeholders. Moreover, it renders bad config for envoy, so it stops accepting any changes.
  3. openTelemetry.body field accepts %KUMA_*% placeholders, but it does escaping automatically, so json is getting escaped and not parsed by DD correctly
  4. I also tried to put to openTelemetry.body object instead of string (just using test: "value" but seems that such config is silently ignored by Kuma and it reverts to default text line

Could you please help with that?

@jakubdyszkiewicz i see that if oltp logs are specified kuma just configures envoy to send the logs in openmetrics format. But i did not found any retry logic/settings/documentation in envoy about it. Do you know any details?