splunk-connect-for-kubernetes: Compatibility with EKS 1.21 and token service account expiry

What happened: After our EKS was upgraded to 1.21, we saw annotations like the following appear in api server audit logs in AWS, for service accounts that Splunk Connect pods are using:

subject: system:serviceaccount:<namespace here>:<sa name here>, seconds after warning threshold: 3989

This is due to changes in token expiry in K8s 1.21 as described here: https://docs.aws.amazon.com/eks/latest/userguide/service-accounts.html#identify-pods-using-stale-tokens

It would appear that there is 90d grace period, after which tokens will be rejected. It looks like the Splunk Connect agents needs to use a later client SDK version, or is there a workaround?

What you expected to happen: More recent k8s client sdk was used in 1.4.11 Splunk Connect used so that the tokens wouldn’t get flagged. At some kube version when AWS will change to the default 1h tokens, the pods will get errors from api server after an hour (unless they are restarted earlier, as that would refresh the token I think as well).

How to reproduce it (as minimally and precisely as possible): Install or upgrade EKS to 1.21 and check EKS cluster api server audit logs with this query:

fields @timestamp
| filter @logStream like /kube-apiserver-audit/
| filter @message like /seconds after warning threshold/
| parse @message "subject: *, seconds after warning threshold:*\"" as subject, elapsedtime

based on: https://docs.aws.amazon.com/eks/latest/userguide/service-accounts.html#identify-pods-using-stale-tokens

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.21
  • Ruby version (use ruby --version):
  • OS (e.g: cat /etc/os-release):
  • Splunk version:
  • Splunk Connect for Kubernetes helm chart version: 1.14.11
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 8
  • Comments: 20 (10 by maintainers)

Most upvoted comments

the token is already changed by k8s periodically. Only the user of the token needs to reload it from disk. IMHO “vanilla” fluentd doesnt use the k8s api. it’s the plugins that use it. Like kubernetes_metadata_filter.

There is already an issue open about that: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/issues/323

maybe SCK uses more k8s api things for object/metrics components.

@harshit-splunk probably tomorrow I can test the hvaghani/fluentd-hec image. The other components we don’t use.

This is a major issue for us and will result in us having to look at alternative technologies. We’ve raised it with our splunk account manager, so hopefully it can be fixed quickly.