keda: Keda Identity Is not authenitcating to Service Bus after few hours

Report

While running Keda 2.8 .1on AKS 1.24.6 on few regions we see that after some time ( could be Days / Hours ) Keda looses the Managed Identity and has many authentication errors .

Restarting Keda Pod will fix the issue. we wonder if this is a bug in Workload Identity / Azure VM restart or Keda Issue . we also want to see whether Keda Pod Health Check can be integrated with those logs and restart itself in case those error occurs

Expected Behavior

  • let Keda restart in case those error start to fire

Actual Behavior

  • Keda keeps emmitting those logs without the option to self-heal

Steps to Reproduce the Problem

  1. AKS with 1.24.6
  2. 17 Service bus Queues
  3. Enable Workload identity Federation on the cluster + on the TriggerAuthentication
  4. Keda has Service Account + Federation Credentials is working
  5. Keda is running OK for few hours and then all of a suddent looses Token

Logs from KEDA operator

2022-12-07T14:36:38Z    ERROR   scalehandler    Error getting scale decision    {"scaledobject.Name": "cb", "scaledObject.Namespace": "vi-be-map", "scaleTarget.Name": "vi-cb-api", "error": "error reading service account token - open : no such file or directory"}
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
        /workspace/pkg/scaling/scale_handler.go:278
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
        /workspace/pkg/scaling/scale_handler.go:149

KEDA Version

2.8.1

Kubernetes Version

1.24

Platform

Microsoft Azure

Scaler Details

Azure Service Bus

Anything else?

  • the Logs says it cannot find the service account token but the service account token is there .
  • Workload Identity Federation is active

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 77 (41 by maintainers)

Most upvoted comments

Are there any work arounds by any chance?

No, there isn’t any workaround because it’s related with a bug. This PR fixes it

To be clear, this is for the pod identity problem reported by @jmos5156 & @Dragonsong3k which is tracked in https://github.com/kedacore/keda/issues/4026. Not the issue @tshaiman is having

update : I just came back with a call with one of the Workload Identity Engineers and he has reviewed our configuration and also thinks the problem is now MARINER based. I will keep updating you on this thread.

BTW, awesome research @tshaiman and @yehiyam . Thanks a lot for your effort 🙇

@tomkerkhove thank you for the clarification.

@raorugan the workload identity and I could not have recreated it so i’m closing it now but would keep on monitoring it once keda 2.9.x will be out ( found another blocking bug in 2.9.1 )

absolutely they got a bug as well as the Workload Identity

i’m trying everything now 😔

@JorTurFer : I want to become contributor , but my GoLang is a bit rough in the last 2 years (* moved to other languages) do you think the ramp up is reasonable ?