scylla-operator: Update of the `spec.externalSeeds` option for `ScyllaCluster` triggers redundant pod-rollout breaking `QUORUM` having 2 pods.

What happened?

Scenario:

  • Provision 2 EKS K8S clusters in 2 different regions (1:eu-north-1 , 2:eu-west-1)
  • Deploy 3 pods in the first region/K8S
  • Deploy 3 more pods in the second region specifying the spec.externalSeeds field.
  • Wait for the readiness of the DB cluster
  • Run 2 stress commands with LOCAL_QUORUM - 1 per region
  • Do some stuff with Scylla pods from first region which cause pod recreation making it change IP addresses
  • Decommission last pod in the second region (at the end of the step we have 2 pods and QUORUM=2)
  • Update the spec.externalSeeds field of the second ScyllaCluster object with the new IPs from the first region planning to add a new node
  • Update the spec.datacenter.racks.0.members with the 3 value making it trigger a add_node operation with the idea of using the new/actual seed IP addresses
  • >>> FAILURE: Scylla-operator starts roll-out of the existing pods breaking the QUORUM
  • Wait for the DB cluster extention

Logs from the test runner:

2023-11-15 17:38:58,912 f:__init__.py     l:2553 c:sdcm.cluster_k8s     p:DEBUG > eu-west-1: Replace `/spec/externalSeeds' with `['10.0.11.107', '10.0.10.255', '10.0.11.205']' in sct-cluster's spec
2023-11-15 17:38:59,068 f:__init__.py     l:2553 c:sdcm.cluster_k8s     p:DEBUG > eu-west-1: Replace `/spec/datacenter/racks/0/members' with `3' in sct-cluster's spec
...
2023-11-15 17:39:09,465 f:__init__.py     l:936  c:sdcm.utils.k8s       p:INFO  > eu-west-1: 'scylla/sct-cluster-eu-west-1-rack-1-1' node has changed it's pod IP address from '10.4.8.73' to '10.4.8.104'. All old IPs: 10.4.9.133, 10.4.8.98, 10.4.8.147, 10.4.8.98, 10.4.8.104, 10.4.8.147, 10.4.8.104, 10.4.8.98, 10.4.8.73
...
023-11-15 17:39:51,143 f:__init__.py     l:936  c:sdcm.utils.k8s       p:INFO  > eu-west-1: 'scylla/sct-cluster-eu-west-1-rack-1-2' node has changed it's pod IP address from '10.4.10.176' to '10.4.11.50'. All old IPs: 10.4.8.142, 10.4.11.219, 10.4.10.176
...
2023-11-15 17:44:21,635 f:__init__.py     l:936  c:sdcm.utils.k8s       p:INFO  > eu-west-1: 'scylla/sct-cluster-eu-west-1-rack-1-0' node has changed it's pod IP address from '10.4.9.30' to '10.4.10.226'. All old IPs: 10.4.9.183, 10.4.9.246, 10.4.9.30

Node logs which was roll-outed first:

INFO  2023-11-15 17:38:55,782 [shard  0] gossip - 60000 ms elapsed, 10.4.11.219 gossip quarantine over
2023-11-15 17:38:59,527 INFO waiting for scylla to stop
INFO  2023-11-15 17:38:59,527 [shard  0] compaction_manager - Asked to stop
INFO  2023-11-15 17:38:59,527 [shard  0] compaction_manager - Stopping 1 tasks for 1 ongoing compactions due to shutdown
INFO  2023-11-15 17:38:59,527 [shard  0] init - Signal received; shutting down
INFO  2023-11-15 17:38:59,527 [shard  0] init - Shutting down view builder ops
INFO  2023-11-15 17:38:59,527 [shard  0] view - Draining view builder

Loader failure due to the breakage of the QUORUM:

loader-west-1
WARN  17:38:51,966 Error creating netty channel to 10-4-9-133.sct-cluster-eu-west-1-rack-1-1.scylla.svc.cluster.local/10.4.9.133:9042
com.datastax.shaded.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: 10-4-9-133.sct-cluster-eu-west-1-rack-1-1.scylla.svc.cluster.local/10.4.9.133:9042
Caused by: java.net.NoRouteToHostException: No route to host
...
com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded, 1 failed)
com.datastax.driver.core.exceptions.WriteFailureException: Cassandra failure during write query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded, 1 failed)

So, update of the spec.externalSeeds field must not trigger roll-outs for the existing pods. It makes no sense because the seed values are used only during the bootstrap. So, when a pod reaches the time to roll-out it will pick up this value correctly anyway.

What did you expect to happen?

I did expect that update of the spec.externalSeeds field doesn’t trigger roll-outs. Never.

How can we reproduce it (as minimally and precisely as possible)?

Update the spec.externalSeeds field having 1-2 pods to get QUORUM breakage or update with any number of pods to get redundant roll-out.

Scylla Operator version

v1.11.0

Kubernetes platform name and version

Kubernetes platform info:
Client Version: version.Info{
    Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", 
    GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{
    Major:"1", Minor:"27+", GitVersion:"v1.27.7-eks-4f4795d", GitCommit:"3719c8491f81867f591e895a43b4f5aab4145794", 
    GitTreeState:"clean", BuildDate:"2023-10-20T23:21:04Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}

Please attach the must-gather archive.

Jenkins job URL Argus

Anything else we need to know?

No response

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Comments: 24 (20 by maintainers)

Most upvoted comments

Any configuration change should trigger a rollout, seeds are part of the config, meaning their change triggers a rollout.

Real users/admins won’t update the spec.externalSeeds manually each time Scylla pods from the first region get recreated.

That’s why we recommend using a DNS which is able to resolve to correct PodIP upon IP change. https://operator.docs.scylladb.com/stable/multidc/multidc.html#retrieve-podips-of-scylladb-nodes-for-use-as-external-seeds