mysql-operator: Node unable to rejoin after failure

For some context, I am using the mySQL Operator by PressLabs on Kubernetes which utilizes this application. My Kubernetes nodes are preemptible, which means they can occassionally die (usually once a day).

I’m observing an interesting behavior where I have a cluster of three orchestrators. They all work really well until one of the nodes dies and then when a new one comes up it looks like the other two ignore it.

Here’s some orchestrator logs:

2018-08-27 18:18:27.000 CDT Successfully pulled image "quay.io/presslabs/orchestrator:v3.0.11-r21"
2018-08-27 18:18:27.000 CDT Created container 
2018-08-27 18:18:27.000 CDT Started container
2018-08-27 18:18:37.000 CDT Readiness probe failed: HTTP probe failed with statuscode: 500

The failing healthcheck goes on perpetually

This is emitted from the node that restart:

I  [martini] Completed 500 Internal Server Error in 7.805308ms
I  [martini] Started GET /api/raft-health for 10.8.33.1:48672
E  2018/08/27 23:18:36 [INFO] raft: Node at 10.8.33.10:10008 [Candidate] entering Candidate state
E  2018/08/27 23:18:36 [WARN] raft: Election timeout reached, restarting election
E  2018/08/27 23:18:35 [DEBUG] raft: Vote granted from 10.8.33.10:10008. Tally: 1
E  2018/08/27 23:18:35 [DEBUG] raft: Votes needed: 2
E  2018/08/27 23:18:35 [WARN] raft: Remote peer 10.8.31.3:10008 does not have local node 10.8.33.10:10008 as a peer
E  2018/08/27 23:18:35 [WARN] raft: Remote peer 10.8.32.3:10008 does not have local node 10.8.33.10:10008 as a peer
E  2018/08/27 23:18:34 [INFO] raft: Node at 10.8.33.10:10008 [Candidate] entering Candidate state
E  2018/08/27 23:18:34 [WARN] raft: Election timeout reached, restarting election
E  2018/08/27 23:18:32 [DEBUG] raft: Vote granted from 10.8.33.10:10008. Tally: 1
E  2018/08/27 23:18:32 [DEBUG] raft: Votes needed: 2
E  2018/08/27 23:18:32 [WARN] raft: Remote peer 10.8.32.3:10008 does not have local node 10.8.33.10:10008 as a peer
E  2018/08/27 23:18:32 [WARN] raft: Remote peer 10.8.31.3:10008 does not have local node 10.8.33.10:10008 as a peer
E  2018/08/27 23:18:30 [INFO] raft: Node at 10.8.33.10:10008 [Candidate] entering Candidate state
E  2018/08/27 23:18:30 [WARN] raft: Heartbeat timeout from "" reached, starting election
E  2018/08/27 23:18:29 [INFO] raft: Node at 10.8.33.10:10008 [Follower] entering Follower state (Leader: "")
E  2018/08/27 23:18:29 [INFO] raft: Restored from snapshot 15915-17741-1535409376687
E  2018-08-27 23:18:27 FATAL 2018-08-27 23:18:27 ERROR failed to open raft store: lookup mysql-operator-orchestrator-1.mysql-operator-orchestrator-headless on 10.11.240.10:53: no such host
E  2018-08-27 23:18:27 ERROR failed to open raft store: lookup mysql-operator-orchestrator-1.mysql-operator-orchestrator-headless on 10.11.240.10:53: no such host
E  2018-08-27 23:18:27 ERROR lookup mysql-operator-orchestrator-1.mysql-operator-orchestrator-headless on 10.11.240.10:53: no such host
E  2018-08-27 23:18:27 ERROR lookup mysql-operator-orchestrator-1.mysql-operator-orchestrator-headless on 10.11.240.10:53: no such host
E  2018-08-27 23:18:27 ERROR lookup mysql-operator-orchestrator-1.mysql-operator-orchestrator-headless on 10.11.240.10:53: no such host

This is emitted from the other nodes:

E  2018/08/28 00:48:48 [DEBUG] raft: Votes needed: 2
E  2018/08/28 00:48:48 [WARN] raft: Remote peer 10.8.32.3:10008 does not have local node 10.8.33.10:10008 as a peer
E  2018/08/28 00:48:48 [INFO] raft: Node at 10.8.33.10:10008 [Candidate] entering Candidate state
E  2018/08/28 00:48:48 [WARN] raft: Election timeout reached, restarting election
E  2018/08/28 00:48:48 [WARN] raft: Rejecting vote request from 10.8.33.10:10008 since we have a leader: 10.8.32.3:10008
E  2018/08/28 00:48:48 [DEBUG] raft: Failed to contact 10.8.30.6:10008 in 1h34m3.397022732s
E  2018/08/28 00:48:48 [DEBUG] raft: Failed to contact 10.8.30.6:10008 in 1h34m2.919864839s
I  [martini] Started GET /api/lb-check for 10.8.31.1:60068
E  2018/08/28 00:48:48 [WARN] raft: Rejecting vote request from 10.8.33.10:10008 since we have a leader: 10.8.32.3:10008
I  k8s.io update kube-system:cluster-autoscaler cluster-autoscaler {"@type":"type.googleapis.com/google.cloud.audit.AuditLog","status":{},"authenticationInfo":{"principalEmail":"cluster-autoscaler"},"requestMetadata":{"callerIp":"::1"},"serviceName":"k8s.io","methodName":"io.k8s.core.v1.endpoints.update","authorizationInfo":[{"resource":"core/v1/namespaces/kube-sys… k8s.io update kube-system:cluster-autoscaler cluster-autoscaler 
E  2018/08/28 00:48:47 [WARN] raft: Rejecting vote request from 10.8.33.10:10008 since we have a leader: 10.8.32.3:10008
I  2018-08-28T00:48:47,943449832+00:00 requests.cpu needs updating. Is: '', want: '100m'.
E  Error from server (NotFound): daemonsets.extensions "fluentd-gcp-v3.0.0" not found
I  2018-08-28T00:48:47,791328941+00:00 fluentd-gcp-scaling-policy not found in namespace kube-system, using defaults.
E  Error from server (NotFound): scalingpolicies.scalingpolicy.kope.io "fluentd-gcp-scaling-policy" not found
E  2018/08/28 00:48:47 [DEBUG] raft: Votes needed: 2
E  2018/08/28 00:48:47 [WARN] raft: Remote peer 10.8.31.3:10008 does not have local node 10.8.33.10:10008 as a peer
E  2018/08/28 00:48:47 [INFO] raft: Node at 10.8.33.10:10008 [Candidate] entering Candidate state

It seems like a node should be able to rejoin after failure, even if it’s using a different IP address.

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 22 (10 by maintainers)

Commits related to this issue

This commit fixes rafting and routing to the leader issues. Fixes #107 This commit makes a service for each pod by using the unique statefull set name label. These services ensure that there is a cl... — committed to bitpoke/mysql-operator by OGKevin 5 years ago
Fix bumping of pinned Vitess version. (#107) Signed-off-by: Anthony Yeh <enisoc@planetscale.com> — committed to chapsuk/mysql-operator by enisoc 4 years ago

Most upvoted comments

@imriss, the version v0.3.2 is already published and contains the fix.

AMecea on Sep 18, 2019

I have submitted a fix for this issue 😊

OGKevin on Jun 24, 2019

@shlomi-noach the new pod gets a new ip. It’s not clear what do I put in RaftNodes and RaftAdvertise in this case.

calind on Nov 2, 2018

Noteworthy that there is another option that does not require a rolling restart upon node replacement: using RaftAdvertise where a node can be reached via an “advertised” IP address (e.g. via load balancer). In such case, you may remove an orchestrator node, provision a new one in its place (with a different IP), and as long as the new node answers on the advertised IP, you should be good to go.

See https://github.com/github/orchestrator/blob/master/docs/configuration-raft.md#nat-firewalls-routing

Also related is a discussion on https://github.com/vitessio/vitess/pull/3665

shlomi-noach on Oct 2, 2018