krustlet: Panic on EKS while watching nodes (401 unauthorized)
I don’t have repro steps other than standing up an EKS cluster and waiting for the node to show up as NotReady. Prior to the panic, the node was Ready and was successfully running WebAssembly applications.
I pulled this from the service log on one of the krustlet nodes:
Apr 15 00:38:56 ip-192-168-71-217.us-west-2.compute.internal krustlet[2260]: [2020-04-15T00:38:56Z WARN kube::runtime::informer] Unexpected watch error: Api(ErrorResponse { status: "Failure", message: "Unauthorized", reason: "Unauthorized", code: 401 })
Apr 15 00:38:56 ip-192-168-71-217.us-west-2.compute.internal krustlet[2260]: thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Api(ErrorResponse { status: "Failure", message: "Unauthorized", reason: "Unauthorized", code: 401 })', /home/ec2-user/.cargo/git/checkouts/krustlet-dd9f49a0b51f9977/31f4940/crates/kubelet/src/kubelet.rs:78:41
which points at this unwrap call during polling for pods.
I don’t know why the EKS managed API server is occasionally returning a 401 yet, but I thought I’d file an issue here in case anyone else runs into this.
Unfortunately, after the panic, the krustlet service goes into a failure loop when attempting to reclaim the existing node registration (possibly related to this issue?):
Apr 15 00:51:32 ip-192-168-71-217.us-west-2.compute.internal krustlet[4144]: thread 'main' panicked at 'Unable to recreate node...aborting: Api(ErrorResponse { status: "Failure", message: "nodes \"ip-192-168-71-217.us-west-2.compute.internal\" is forbidden: User \"system:node:ip-192-168-71-217.us-west-2.compute.internal\" cannot delete resource \"nodes\" in API group \"\" at the cluster scope", reason: "Forbidden", code: 403 })', /home/ec2-user/.cargo/git/checkouts/krustlet-dd9f49a0b51f9977/31f4940/crates/kubelet/src/node.rs:47:21
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 17 (16 by maintainers)
Commits related to this issue
- chore(*): Updates kube to 0.33 This release contains the "refresh token" feature. Closes #187 — committed to thomastaylor312/krustlet by thomastaylor312 4 years ago
The nodes have been up for 2d and appear to be healthy. I think we can declare the fix works 🎉 !
So I ended up giving it a shot by fixing it in the upstream library. I’ll open a PR once I get a chance to test this