cluster-api-provider-aws: AWSMachineTemplate with spot instance generate false error log

/kind bug

What steps did you take and what happened: When I declare spotInstance on a new AWSMachineTemplate, like this:

--- a/clusters/infrastructure.cluster.x-k8s.io-v1beta1.AWSMachineTemplate-xxx-v1.yaml
+++ b/clusters/infrastructure.cluster.x-k8s.io-v1beta1.AWSMachineTemplate-xxx-v2.yaml
@@ -1,11 +1,13 @@
 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
 kind: AWSMachineTemplate
 metadata:
-  name: xxx-v1
+  name: xxx-v2
   namespace: flux-system
 spec:
   template:
     spec:
       iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
       instanceType: xxx.medium
+      spotMarketOptions:
+        maxPrice: ""
       sshKeyName: xxx

Everything works fine. MachineDeployment using this template provision the new nodes, etc. They are seen by the management cluster and by the workload cluster.

But it generates a false error log on capi-controller-manager :

E0411 12:54:02.362276       1 controller.go:317] controller/machine "msg"="Reconciler error" "error"="machines.cluster.x-k8s.io \"XXXX\" not found" "name"=XXXX" "namespace"="flux-system" "reconciler group"="cluster.x-k8s.io" "reconciler kind"="Machine"

Even though the nodes and the machines are here, consistent, up and running.

What did you expect to happen:

I don’t get a false error log message when I use spot instances.

Anything else you would like to add:

When I remove spotMarketOptions, I don’t get this false error log.

Environment:

  • Cluster-api version: 1.1.3
  • Cluster-api-provider-aws version: 1.4.0
  • Kubernetes version: (use kubectl version): v1.22.6-eks-7d68063
  • OS (e.g. from /etc/os-release): EKS

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (9 by maintainers)

Most upvoted comments

We’ve not found any way to fix it but with

  1. Cluster API: v1.4.1
  2. AWS Provider: v2.0.2

This issue has vanished.

We have spot instance tests running in our CI, which doesn’t flood with this error, you could check out the logs here

Here is the cluster template used. Could you please check if you are doing configurations differently?

Hey @Ankitasw 👋,

This error message is not until the machine is created.

It never stops to fill the logs, even after the new nodes/machines are created, pods are moved on it and everything is stable.

Without spot instances, this error message cease when the new node/machine is up and running.