splunk-connect-for-kubernetes: Compatibility with EKS 1.21 and token service account expiry
What happened: After our EKS was upgraded to 1.21, we saw annotations like the following appear in api server audit logs in AWS, for service accounts that Splunk Connect pods are using:
subject: system:serviceaccount:<namespace here>:<sa name here>, seconds after warning threshold: 3989
This is due to changes in token expiry in K8s 1.21 as described here: https://docs.aws.amazon.com/eks/latest/userguide/service-accounts.html#identify-pods-using-stale-tokens
It would appear that there is 90d grace period, after which tokens will be rejected. It looks like the Splunk Connect agents needs to use a later client SDK version, or is there a workaround?
What you expected to happen: More recent k8s client sdk was used in 1.4.11 Splunk Connect used so that the tokens wouldn’t get flagged. At some kube version when AWS will change to the default 1h tokens, the pods will get errors from api server after an hour (unless they are restarted earlier, as that would refresh the token I think as well).
How to reproduce it (as minimally and precisely as possible): Install or upgrade EKS to 1.21 and check EKS cluster api server audit logs with this query:
fields @timestamp
| filter @logStream like /kube-apiserver-audit/
| filter @message like /seconds after warning threshold/
| parse @message "subject: *, seconds after warning threshold:*\"" as subject, elapsedtime
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version): 1.21 - Ruby version (use
ruby --version): - OS (e.g:
cat /etc/os-release): - Splunk version:
- Splunk Connect for Kubernetes helm chart version: 1.14.11
- Others:
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 8
- Comments: 20 (10 by maintainers)
the token is already changed by k8s periodically. Only the user of the token needs to reload it from disk. IMHO “vanilla” fluentd doesnt use the k8s api. it’s the plugins that use it. Like kubernetes_metadata_filter.
There is already an issue open about that: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/issues/323
maybe SCK uses more k8s api things for object/metrics components.
@harshit-splunk probably tomorrow I can test the hvaghani/fluentd-hec image. The other components we don’t use.
This is a major issue for us and will result in us having to look at alternative technologies. We’ve raised it with our splunk account manager, so hopefully it can be fixed quickly.