scylla-operator: Update of the `spec.externalSeeds` option for `ScyllaCluster` triggers redundant pod-rollout breaking `QUORUM` having 2 pods.

What happened?

Scenario:

Provision 2 EKS K8S clusters in 2 different regions (1:eu-north-1 , 2:eu-west-1)
Deploy 3 pods in the first region/K8S
Deploy 3 more pods in the second region specifying the spec.externalSeeds field.
Wait for the readiness of the DB cluster
Run 2 stress commands with LOCAL_QUORUM - 1 per region
Do some stuff with Scylla pods from first region which cause pod recreation making it change IP addresses
Decommission last pod in the second region (at the end of the step we have 2 pods and QUORUM=2)
Update the spec.externalSeeds field of the second ScyllaCluster object with the new IPs from the first region planning to add a new node
Update the spec.datacenter.racks.0.members with the 3 value making it trigger a add_node operation with the idea of using the new/actual seed IP addresses
>>> FAILURE: Scylla-operator starts roll-out of the existing pods breaking the QUORUM
Wait for the DB cluster extention

Logs from the test runner:

2023-11-15 17:38:58,912 f:__init__.py     l:2553 c:sdcm.cluster_k8s     p:DEBUG > eu-west-1: Replace `/spec/externalSeeds' with `['10.0.11.107', '10.0.10.255', '10.0.11.205']' in sct-cluster's spec
2023-11-15 17:38:59,068 f:__init__.py     l:2553 c:sdcm.cluster_k8s     p:DEBUG > eu-west-1: Replace `/spec/datacenter/racks/0/members' with `3' in sct-cluster's spec
...
2023-11-15 17:39:09,465 f:__init__.py     l:936  c:sdcm.utils.k8s       p:INFO  > eu-west-1: 'scylla/sct-cluster-eu-west-1-rack-1-1' node has changed it's pod IP address from '10.4.8.73' to '10.4.8.104'. All old IPs: 10.4.9.133, 10.4.8.98, 10.4.8.147, 10.4.8.98, 10.4.8.104, 10.4.8.147, 10.4.8.104, 10.4.8.98, 10.4.8.73
...
023-11-15 17:39:51,143 f:__init__.py     l:936  c:sdcm.utils.k8s       p:INFO  > eu-west-1: 'scylla/sct-cluster-eu-west-1-rack-1-2' node has changed it's pod IP address from '10.4.10.176' to '10.4.11.50'. All old IPs: 10.4.8.142, 10.4.11.219, 10.4.10.176
...
2023-11-15 17:44:21,635 f:__init__.py     l:936  c:sdcm.utils.k8s       p:INFO  > eu-west-1: 'scylla/sct-cluster-eu-west-1-rack-1-0' node has changed it's pod IP address from '10.4.9.30' to '10.4.10.226'. All old IPs: 10.4.9.183, 10.4.9.246, 10.4.9.30

Node logs which was roll-outed first:

INFO  2023-11-15 17:38:55,782 [shard  0] gossip - 60000 ms elapsed, 10.4.11.219 gossip quarantine over
2023-11-15 17:38:59,527 INFO waiting for scylla to stop
INFO  2023-11-15 17:38:59,527 [shard  0] compaction_manager - Asked to stop
INFO  2023-11-15 17:38:59,527 [shard  0] compaction_manager - Stopping 1 tasks for 1 ongoing compactions due to shutdown
INFO  2023-11-15 17:38:59,527 [shard  0] init - Signal received; shutting down
INFO  2023-11-15 17:38:59,527 [shard  0] init - Shutting down view builder ops
INFO  2023-11-15 17:38:59,527 [shard  0] view - Draining view builder

Loader failure due to the breakage of the QUORUM:

loader-west-1
WARN  17:38:51,966 Error creating netty channel to 10-4-9-133.sct-cluster-eu-west-1-rack-1-1.scylla.svc.cluster.local/10.4.9.133:9042
com.datastax.shaded.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: 10-4-9-133.sct-cluster-eu-west-1-rack-1-1.scylla.svc.cluster.local/10.4.9.133:9042
Caused by: java.net.NoRouteToHostException: No route to host
...
com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded, 1 failed)
com.datastax.driver.core.exceptions.WriteFailureException: Cassandra failure during write query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded, 1 failed)

So, update of the spec.externalSeeds field must not trigger roll-outs for the existing pods. It makes no sense because the seed values are used only during the bootstrap. So, when a pod reaches the time to roll-out it will pick up this value correctly anyway.

What did you expect to happen?

I did expect that update of the spec.externalSeeds field doesn’t trigger roll-outs. Never.

How can we reproduce it (as minimally and precisely as possible)?

Update the spec.externalSeeds field having 1-2 pods to get QUORUM breakage or update with any number of pods to get redundant roll-out.

Scylla Operator version

v1.11.0

Kubernetes platform name and version

Kubernetes platform info:

Client Version: version.Info{
    Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", 
    GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{
    Major:"1", Minor:"27+", GitVersion:"v1.27.7-eks-4f4795d", GitCommit:"3719c8491f81867f591e895a43b4f5aab4145794", 
    GitTreeState:"clean", BuildDate:"2023-10-20T23:21:04Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}

Please attach the must-gather archive.

kubernetes-181576c2.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/181576c2-fca5-483c-a520-ce2108b9874a/20231115_174733/kubernetes-181576c2.tar.gz
db-cluster-181576c2.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/181576c2-fca5-483c-a520-ce2108b9874a/20231115_174733/db-cluster-181576c2.tar.gz
sct-runner-events-181576c2.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/181576c2-fca5-483c-a520-ce2108b9874a/20231115_174733/sct-runner-events-181576c2.tar.gz
sct-181576c2.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/181576c2-fca5-483c-a520-ce2108b9874a/20231115_174733/sct-181576c2.log.tar.gz
loader-set-181576c2.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/181576c2-fca5-483c-a520-ce2108b9874a/20231115_174733/loader-set-181576c2.tar.gz
monitor-set-181576c2.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/181576c2-fca5-483c-a520-ce2108b9874a/20231115_174733/monitor-set-181576c2.tar.gz

Jenkins job URL Argus

Anything else we need to know?

No response

About this issue

Original URL
State: closed
Created 7 months ago
Comments: 24 (20 by maintainers)

Most upvoted comments

Any configuration change should trigger a rollout, seeds are part of the config, meaning their change triggers a rollout.

Real users/admins won’t update the spec.externalSeeds manually each time Scylla pods from the first region get recreated.

That’s why we recommend using a DNS which is able to resolve to correct PodIP upon IP change. https://operator.docs.scylladb.com/stable/multidc/multidc.html#retrieve-podips-of-scylladb-nodes-for-use-as-external-seeds

zimnx on Nov 16, 2023