karmada: karmada-controller-manager: panic: runtime error: index out of range

What happened:

karmada-controller-manager crashed, caused by panic: runtime error: index out of range

What you expected to happen:

index checked, not to panic, not to crash.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

The index may be -1. https://github.com/karmada-io/karmada/blob/2526de856ebe8b4e57cdf5ad9e544604aa9f0e43/pkg/modeling/modeling.go#L162-L168

https://github.com/karmada-io/karmada/blob/2526de856ebe8b4e57cdf5ad9e544604aa9f0e43/pkg/modeling/modeling.go#L105-L114

Environment:

  • Karmada version: v1.6.1
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 36 (33 by maintainers)

Most upvoted comments

I can fix this issue. index returns negative one, which means that no matching resource model can be found. Let me write some test code to test it again.

@jwcesign Thanks for the advice, let me test it this weekend. I will find out the final solution.

@chaosi-zju OK, make sense. Let me revise this part of the code. Thanks.

@jwcesign Hi, I’m back and so sorry for the late reply. I got Covid last two weeks. I felt so bad. I have recovered. I have 2 questions about your comment:

  1. The number 5, how did you come up with this number? By lots of concurrent testing?
  2. For the modelSortings, modelSortings cannot be a local variable because this state needs to be kept up to date in real time. In the binary search below, we need to search for the latest status. I prefer the above approach of adding a lock(). After I locked it locally, there was indeed no panic.

Regarding the concurrency investigation of function AddToResourceSummary, what you actually mean is that the getAllocatableModelings() function will be called concurrently in the controller, right? AddToResourceSummary will be called in the function getAllocatableModelings(). I just tested it locally and I can indeed see a lot of calls to the AddToResourceSummary function. I think this function caused the panic. I plan to add lock() every time I update the resource. Do you agree with me about this solution?

I haven’t submitted the PR yet because I’m afraid that the change to add lock() will be too general and affect performance. In all update operations, resources are only added and not deleted.

I have received your @tedli case. I’ll fix this tonight. I’m sooooo sorry I’ve been busy with work these two days.

@tedli Hi, could you provide some parameters to initialize resource modeling? I want to reproduce this issue on my local machine.