cluster-api-provider-aws: AWSMachineTemplate with spot instance generate false error log
/kind bug
What steps did you take and what happened: When I declare spotInstance on a new AWSMachineTemplate, like this:
--- a/clusters/infrastructure.cluster.x-k8s.io-v1beta1.AWSMachineTemplate-xxx-v1.yaml
+++ b/clusters/infrastructure.cluster.x-k8s.io-v1beta1.AWSMachineTemplate-xxx-v2.yaml
@@ -1,11 +1,13 @@
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
- name: xxx-v1
+ name: xxx-v2
namespace: flux-system
spec:
template:
spec:
iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
instanceType: xxx.medium
+ spotMarketOptions:
+ maxPrice: ""
sshKeyName: xxx
Everything works fine. MachineDeployment using this template provision the new nodes, etc. They are seen by the management cluster and by the workload cluster.
But it generates a false error log on capi-controller-manager :
E0411 12:54:02.362276 1 controller.go:317] controller/machine "msg"="Reconciler error" "error"="machines.cluster.x-k8s.io \"XXXX\" not found" "name"=XXXX" "namespace"="flux-system" "reconciler group"="cluster.x-k8s.io" "reconciler kind"="Machine"
Even though the nodes and the machines are here, consistent, up and running.
What did you expect to happen:
I don’t get a false error log message when I use spot instances.
Anything else you would like to add:
When I remove spotMarketOptions, I don’t get this false error log.
Environment:
- Cluster-api version: 1.1.3
- Cluster-api-provider-aws version: 1.4.0
- Kubernetes version: (use
kubectl version): v1.22.6-eks-7d68063 - OS (e.g. from
/etc/os-release): EKS
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (9 by maintainers)
We’ve not found any way to fix it but with
v1.4.1v2.0.2This issue has vanished.
We have spot instance tests running in our CI, which doesn’t flood with this error, you could check out the logs here
Here is the cluster template used. Could you please check if you are doing configurations differently?
Hey @Ankitasw 👋,
This error message is not until the machine is created.
It never stops to fill the logs, even after the new nodes/machines are created, pods are moved on it and everything is stable.
Without spot instances, this error message cease when the new node/machine is up and running.