keda: Constants crashes in keda operator after deploying service controlled by scaledobject

Report

Keda controller is constantly crashing after I deploy a new version of the service targeted by the scaled object.

It tends to work for a while but after deploying the service, no metrics can be queried. The Keda controller logs all spit a bunch or errors, but all of them are related to the GetMetricsfunction.

Expected Behavior

No crashes

Actual Behavior

Constant crashes in the keda controller

Steps to Reproduce the Problem

  • Deploy scaled object. I can query the metrics.
  • Redeploy the service that is the controlled by the scaled object.
  • KEDA starts crashing. There are multiple bugs according to the logs, but all of them revolve around the GetMetrics function. Examples are “assignment to nil map”, “out of index”, “concurrent write to map”. All of them are bugs in KEDA code so I don’t think it’s an issue with the scaled object config.

Logs from KEDA operator

2023-03-17T21:28:05Z    ERROR   scalehandler    Failed to patch ScaledObjects Status    {"error": "resourc>
github.com/kedacore/keda/v2/pkg/fallback.updateStatus
        /workspace/pkg/fallback/fallback.go:126
github.com/kedacore/keda/v2/pkg/fallback.GetMetricsWithFallback
        /workspace/pkg/fallback/fallback.go:58
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics
        /workspace/pkg/scaling/scale_handler.go:446
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics
        /workspace/pkg/metricsservice/server.go:45
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler
        /workspace/pkg/metricsservice/api/metrics_grpc.pb.go:79
google.golang.org/grpc.(*Server).processUnaryRPC
        /workspace/vendor/google.golang.org/grpc/server.go:1340
google.golang.org/grpc.(*Server).handleStream
        /workspace/vendor/google.golang.org/grpc/server.go:1713
google.golang.org/grpc.(*Server).serveStreams.func1.2
        /workspace/vendor/google.golang.org/grpc/server.go:965
panic: runtime error: index out of range [29] with length 29

KEDA Version

2.9.2

Kubernetes Version

1.23

Platform

Amazon Web Services

Scaler Details

Datadog

Anything else?

The only thing weird about this scaler is that it has around 40 triggers. We are using this service to have a single interface to query the metrics provided by KEDA. I set the min/max replicas to 2. I even disabled autoscaling with 2 replicas, but that didn’t help. But I don’t think the scaledobject config is the issue because we can query the metrics for a little while.

Destroying keda and redeploying seemed to work for a while but it always breaks down around the time the service is deployed.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 34 (13 by maintainers)

Most upvoted comments

@timown with the fix?

no no, sorry, with the latest official release so didn’t really have a chance to test the fix without a reproduce

@saurabhvagrawal @timown @djsly @reynoldsme @martinmr et all: could you please confirm that the failing ScaledObject uses external trigger?

@zroubalik None of the ScaledObjects where we see this issue were using triggers of type external as in https://keda.sh/docs/2.10/concepts/external-scalers/

We were only seeing this on ScaledObjects with triggers of typedatadog. @martinmr It was just that, correct?

@saurabhvagrawal @timown @djsly @reynoldsme @martinmr et all: could you please confirm that the failing ScaledObject uses external trigger?

it is, in our case it uses cpu, memory and external

@saurabhvagrawal @timown @djsly @reynoldsme @martinmr et all: could you please confirm that the failing ScaledObject uses external trigger?

just happened to us too, deleting the scaled object and recreating it solved the issue

I am working in same project with @timown . This happened in production today. We tried with 2.10.1 and 2.9.x and still got the same issue. Here are the attached logs from operator for your reference:

“namespace”: “<redacted>”, “name”: “<redacted>”, “reconcileID”: “973016f5-c838-44e9-82c6-f3d49afa52a7”, “trigger.type”: “external”} 2023-06-08T05:36:29Z INFO Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference {“controller”: “scaledobject”, “controllerGroup”: “keda.sh”, “controllerKind”: “ScaledObject”, “ScaledObject”: {“name”:“<redacted>”,“namespace”:“<redacted>”}, “namespace”: “<redacted>”, “name”: “<redacted>”, “reconcileID”: “973016f5-c838-44e9-82c6-f3d49afa52a7”} panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2f62594]

goroutine 398 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0x1fa panic({0x34c03a0, 0x63ec950}) /usr/local/go/src/runtime/panic.go:884 +0x212 github.com/kedacore/keda/v2/pkg/scaling/resolver.ResolveScaleTargetPodSpec({0x434cf50, 0xc003093410}, {0x4361a70, 0xc000f93260}, {0x3a95560?, 0xc000a84200}) /workspace/pkg/scaling/resolver/scale_resolvers.go:73 +0xd4 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).performGetScalersCache(0xc0001ae5b0, {0x434cf50, 0xc003093410}, {0xc004bc5ce0, 0x26}, {0x3a95560, 0xc000a84200}, 0xc0021aef00, {0x0, 0x0}, …) /workspace/pkg/scaling/scale_handler.go:347 +0x6e5 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScalersCache(0xc00488fa00?, {0x434cf50, 0xc003093410}, {0x3a95560, 0xc000a84200}) /workspace/pkg/scaling/scale_handler.go:273 +0xf6 github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).getScaledObjectMetricSpecs(0xc000f708a0, {0x434cf50, 0xc003093410}, {{0x43550e0?, 0xc003093440?}, 0xc00236a4f0?}, 0xc00488fa00) /workspace/controllers/keda/hpa.go:200 +0xda github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).newHPAForScaledObject(0xc000f708a0, {0x434cf50?, 0xc003093410?}, {{0x43550e0?, 0xc003093440?}, 0x3a231c0?}, 0xc00488fa00, 0xc0021af5f0) /workspace/controllers/keda/hpa.go:74 +0x66 github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).updateHPAIfNeeded(0xc000f708a0, {0x434cf50, 0xc003093410}, {{0x43550e0?, 0xc003093440?}, 0xc003093410?}, 0xc00488fa00, 0xc0005fe700, 0xc004468828?) /workspace/controllers/keda/hpa.go:152 +0x7b github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).ensureHPAForScaledObjectExists(0xc000f708a0, {0x434cf50, 0xc003093410}, {{0x43550e0?, 0xc003093440?}, 0x43550e0?}, 0xc00488fa00, 0x0?) /workspace/controllers/keda/scaledobject_controller.go:431 +0x238 github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).reconcileScaledObject(0xc000f708a0?, {0x434cf50, 0xc003093410}, {{0x43550e0?, 0xc003093440?}, 0xc0023f35f0?}, 0xc00488fa00) /workspace/controllers/keda/scaledobject_controller.go:229 +0x1d8 github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile(0xc000f708a0, {0x434cf50, 0xc003093410}, {{{0xc0023f3550?, 0x10?}, {0xc0023f35f0?, 0x40da87?}}}) /workspace/controllers/keda/scaledobject_controller.go:175 +0x526 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x434cf50?, {0x434cf50?, 0xc003093410?}, {{{0xc0023f3550?, 0x32af080?}, {0xc0023f35f0?, 0x0?}}}) /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122 +0xc8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0010921e0, {0x434cea8, 0xc001296300}, {0x361a020?, 0xc0008bec00?}) /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323 +0x38f sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0010921e0, {0x434cea8, 0xc001296300}) /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274 +0x1d9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:231 +0x333

@zroubalik I don’t have any thing more to share expect that we had the same issue again yesterday. this time it was a different ScaledObject. after we deleted it, everything went back to normal.

@reynoldsme I opened an issue with KEDA