redis-operator: Different RedisFailover's sentinels join together
Environment
How are the pieces configured?
- Redis Operator version v1.2.2
- Kubernetes version v1.23.13
- Kubernetes configuration used (eg: Is RBAC active?)
affinity: {}
annotations: {}
container:
port: 9710
fullnameOverride: ""
image:
pullPolicy: IfNotPresent
repository: quay.io/spotahome/redis-operator
tag: v1.2.2
imageCredentials:
create: false
email: someone@example.com
existsSecrets:
- registrysecret
password: somepassword
registry: url.private.registry
username: someone
monitoring:
enabled: false
prometheus:
name: unknown
serviceAnnotations: {}
serviceMonitor: false
nameOverride: ""
nodeSelector: {}
replicas: 3
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
service:
port: 9710
type: ClusterIP
serviceAccount:
annotations: {}
create: true
name: ""
tolerations: []
updateStrategy:
type: RollingUpdate
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 48 (31 by maintainers)
Hi! As I was thinking about using this operator I came across this issue and I felt immediately remembered about the problems I had getting this “fixed” with Google cloud running Redis Sentinel clusters on VMs. We have around 20 of these running and from time to time you of course want to upgrade or install OS updates. Since we do immutable infrastructure we just don’t update VMs (via Ansible, Chef, Puppet, …) but we completely replace them.
All our Redis clusters have three instances. So if we update such a cluster we shutdown one Redis node (which always contains one Redis and Sentinel process). We build our OS image with Hashicorp Packer. So our automation picks up the latest OS image built and then a new VM starts and re-joins the cluster. And then this also happens with the other two remaining nodes. We first replace the two replicas and finally the primary. Before shutting down the primary we issue a primary failover.
As long as you do this only with one Redis Sentinel cluster it normally works fine. But since the update process runs in parallel all 20 Redis Sentinel cluster are recreated at the same time. All that works just fine. But at the beginning we also discovered that some nodes suddenly tried to join other clusters during that process. All clusters have a different
redisMasterNameconfigured and therefore the join failed of course. To move that nodes back to the cluster they belong we manually had clean the configuration and rejoin that node.We tried a lot of workarounds but nothing really worked reliable as Redis/Sentinel always “remembered” the IP addresses if the old (now gone) nodes. And that’s actually the problem. During that Redis Sentinel clusters recreation process it was likely that one node got an internal IP address that was previously used by a node that belonged to a different Redis Sentinel cluster.
So our “solution” was to give every VM node fixed internal IP addresses. So they don’t change IP addresses as the IP address is allocated once then is then always assigned to the VM that used it before. That “fixed” this issue once and forever 😃
But AFAIK you can’t do this in Kubernetes. So from what I’ve read so far in this thread using
NetworkPolicyor different ports for every Redis Sentinel deployment might be possible workarounds. Since Redis 6.2 you can also use hostnames instead of IP addresses: https://redis.io/docs/management/sentinel/#ip-addresses-and-dns-names Since Kubernetes has DNS autodiscovery this might be a more general solution to this problem.Thanks for sharing the details @EladDolev , will take some time to test and get back with code changes, will update here…
@tparsa I can confirm that it helps, Thanks for the tip.
@zekena2 Also deleting all sentinel pods will fix the problem as well.
What I saw in the operator metrics, indicates that the operator does realize there is a problem with the number of sentinels. But the only fix that the operator does is sending a
SENTINEL RESET *that doesn’t fix anything.