kuma: Slow DP startup time in OpenShift

What happened?

We have some trouble with the startup of our dataplanes. There are always 5 seconds between the request and the response from the control plane with the corresponding bootstrap configuration. As an environment we use RedHat OpenShift as Runtime. So we don’t use the init container, but CNI mode. Here’s a sample of our sidecar logs:

2022-04-28T07:27:19.104Z INFO dataplane trying to fetch bootstrap configuration from the Control Plane
2022-04-28T07:27:24.135Z INFO kuma-dp.run received bootstrap configuration {"adminPort": 9901}

We managed to retrieve some tracing-information about the http-connection. After a quick look at the logs, it becomes clear that the DNS resolution takes the most significant part of the time.

2022-04-28T08:53:08.017Z INFO kuma-dp.run generating bootstrap configuration
2022-04-28T08:53:08.017Z INFO dataplane trying to fetch bootstrap configuration from the Control Plane
2022-04-28T08:53:08.018Z DEBUG "dns":{"start":"","end":"","host":"","address":null,"error":null},"dial":{"start":"","end":""},"connection":{"time":""},"wrote_all_request_header":{"time":""},"wrote_all_request":{"time":""},"first_received_response_byte":{"time":""}}
2022-04-28T08:53:08.018Z DEBUG dataplane Trace build error:
2022-04-28T08:53:08.018Z DEBUG dataplane === START post request
2022-04-28T08:53:08.018Z DEBUG :: dns start
2022-04-28T08:53:08.018Z INFO dataplane 2022-04-28 08:53:08.018480715 +0000 UTC
2022-04-28T08:53:13.022Z DEBUG :: dns end
2022-04-28T08:53:13.022Z INFO dataplane 2022-04-28 08:53:13.022189486 +0000 UTC
2022-04-28T08:53:13.022Z DEBUG :: dial start
2022-04-28T08:53:13.022Z INFO dataplane 2022-04-28 08:53:13.022275425 +0000 UTC
2022-04-28T08:53:13.025Z DEBUG :: dial end
2022-04-28T08:53:13.025Z INFO dataplane 2022-04-28 08:53:13.025074862 +0000 UTC
2022-04-28T08:53:13.032Z DEBUG :: conn time
2022-04-28T08:53:13.032Z INFO dataplane 2022-04-28 08:53:13.032413881 +0000 UTC
2022-04-28T08:53:13.032Z DEBUG :: wrote all request headers
2022-04-28T08:53:13.032Z INFO dataplane 2022-04-28 08:53:13.032563455 +0000 UTC
2022-04-28T08:53:13.032Z DEBUG :: wrote all request
2022-04-28T08:53:13.032Z INFO dataplane 2022-04-28 08:53:13.032646403 +0000 UTC
2022-04-28T08:53:13.037Z DEBUG :: first received response byte
2022-04-28T08:53:13.037Z INFO dataplane 2022-04-28 08:53:13.037235309 +0000 UTC
2022-04-28T08:53:13.037Z DEBUG dataplane === END post request 

We have found that this time can be shortened if you include a DNS config with the deployment. It is therefore probably DNS timeouts, which are set to 5 seconds by default. What we included to deployment is the following:

      dnsConfig:
        options:
          - name: single-request-reopen
          - name: timeout
            value: '1'

We have already discussed this issue in Slack (with Charly Molter), but now we have decided to open an issue after all. Thanks to you all!

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 33 (25 by maintainers)

Most upvoted comments

Yes exactly, I started the setup with the following parameters:

--set "cni.enabled=true" \ --set "experimental.cni=true"

But not with the eBPF support enabled.

Could you maybe delay the start of the dp by a few seconds with: https://kuma.io/docs/dev/explore/dpp-on-kubernetes/#custom-container-configuration ?