keda: Keda Identity Is not authenitcating to Service Bus after few hours
Report
While running Keda 2.8 .1on AKS 1.24.6 on few regions we see that after some time ( could be Days / Hours ) Keda looses the Managed Identity and has many authentication errors .
Restarting Keda Pod will fix the issue. we wonder if this is a bug in Workload Identity / Azure VM restart or Keda Issue . we also want to see whether Keda Pod Health Check can be integrated with those logs and restart itself in case those error occurs
Expected Behavior
- let Keda restart in case those error start to fire
Actual Behavior
- Keda keeps emmitting those logs without the option to self-heal
Steps to Reproduce the Problem
- AKS with 1.24.6
- 17 Service bus Queues
- Enable Workload identity Federation on the cluster + on the TriggerAuthentication
- Keda has Service Account + Federation Credentials is working
- Keda is running OK for few hours and then all of a suddent looses Token
Logs from KEDA operator
2022-12-07T14:36:38Z ERROR scalehandler Error getting scale decision {"scaledobject.Name": "cb", "scaledObject.Namespace": "vi-be-map", "scaleTarget.Name": "vi-cb-api", "error": "error reading service account token - open : no such file or directory"}
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/workspace/pkg/scaling/scale_handler.go:278
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/workspace/pkg/scaling/scale_handler.go:149
KEDA Version
2.8.1
Kubernetes Version
1.24
Platform
Microsoft Azure
Scaler Details
Azure Service Bus
Anything else?
- the Logs says it cannot find the service account token but the service account token is there .
- Workload Identity Federation is active
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 77 (41 by maintainers)
To be clear, this is for the pod identity problem reported by @jmos5156 & @Dragonsong3k which is tracked in https://github.com/kedacore/keda/issues/4026. Not the issue @tshaiman is having
update : I just came back with a call with one of the Workload Identity Engineers and he has reviewed our configuration and also thinks the problem is now MARINER based. I will keep updating you on this thread.
BTW, awesome research @tshaiman and @yehiyam . Thanks a lot for your effort 🙇
@tomkerkhove thank you for the clarification.
@raorugan the workload identity and I could not have recreated it so i’m closing it now but would keep on monitoring it once keda 2.9.x will be out ( found another blocking bug in 2.9.1 )
absolutely they got a bug as well as the Workload Identity
i’m trying everything now 😔
@JorTurFer : I want to become contributor , but my GoLang is a bit rough in the last 2 years (* moved to other languages) do you think the ramp up is reasonable ?