azure-workload-identity: Error reading service account token - open : no such file or directory
Describe the bug While running Keda 2.8 .1on AKS 1.24.6 on few regions we see that after some time ( could be Days / Hours ) Keda looses the Managed Identity and has many authentication errors .
Restarting Keda Pod will fix the issue. we wonder if this is a bug in Workload Identity / Azure VM restart or Keda Issue . we also want to see whether Keda Pod Health Check can be integrated with those logs and restart itself in case those error occurs
Steps To Reproduce
- run a workload that is 24/7 alive and tries to do something with Token ( e.g read message from Service bus ) every 60 seconds
- we have addressed this issue to the Keda team and they claim that all they are doing is to read the token from the file system and if its not there we should consult the Workload Identity Team
Expected behavior
Logs ╰─ k logs -f -l app=keda-operator -n keda --since 5m /workspace/pkg/scaling/cache/scalers_cache.go:94 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /workspace/pkg/scaling/scale_handler.go:278 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /workspace/pkg/scaling/scale_handler.go:149 2022-12-09T19:07:54Z ERROR scalehandler Error getting scale decision {“scaledobject.Name”: “tm”, “scaledObject.Namespace”: “vi-be-map”, “scaleTarget.Name”: “vi-tm-api”, “error”: “error reading service account token - open : no such file or directory”} github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /workspace/pkg/scaling/scale_handler.go:278 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /workspace/pkg/scaling/scale_handler.go:149
Environment
- Kubernetes version (use
kubectl version
): 1.24.6 - Cloud provider or hardware configuration: Azure/ AKS /
- OS (e.g:
cat /etc/os-release
): Mariner V2 - Kernel (e.g.
uname -a
): - Install tools: Keda / Workload-Identity/OIDC-Issuer/
- Network plugin and version (if this is a network-related bug):
- Others: https://github.com/kedacore/keda/issues/3977
Additional context
- we have set Expiration Token to 3600 but the problem happens after around 10-24 Hours
- currently we are just restarting our workload every 1 hour to prevent this Token lost , but would be great if this issue is mitigated
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 27 (24 by maintainers)
Closing this issue as @tshaiman mentioned on an internal thread this is resolved with the AKS release. Please feel free to reopen if you have any questions.