zookeeper-operator: Second ZK pod doesn't starts due to 'java.lang.RuntimeException: My id 2 not in the peer list' exception

Description

We are using zookeeper v.0.2.9. Sometimes (not in all environements) zookeeper-1 pod unable to start due to RuntimeException. Pod 's log:

kubectl logs deployment-zookeeper-1
+ source /conf/env.sh
++ DOMAIN=deployment-zookeeper-headless.objectstore-large.svc.cluster.local
++ QUORUM_PORT=2888
++ LEADER_PORT=3888
++ CLIENT_HOST=deployment-zookeeper-client
++ CLIENT_PORT=9277
++ CLUSTER_NAME=deployment-zookeeper
++ CLUSTER_SIZE=3
+ source /usr/local/bin/zookeeperFunctions.sh
++ set -ex
++ hostname -s
+ HOST=deployment-zookeeper-1
+ DATA_DIR=/data
+ MYID_FILE=/data/myid
+ LOG4J_CONF=/conf/log4j-quiet.properties
+ DYNCONFIG=/data/zoo.cfg.dynamic
+ STATIC_CONFIG=/data/conf/zoo.cfg
+ [[ deployment-zookeeper-1 =~ (.*)-([0-9]+)$ ]]
+ NAME=deployment-zookeeper
+ ORD=1
+ MYID=2
+ WRITE_CONFIGURATION=true
+ REGISTER_NODE=true
+ ONDISK_MYID_CONFIG=false
+ ONDISK_DYN_CONFIG=false
+ '[' -f /data/myid ']'
++ cat /data/myid
+ EXISTING_ID=2
+ [[ 2 == \2 ]]
+ [[ -f /data/conf/zoo.cfg ]]
+ ONDISK_MYID_CONFIG=true
+ '[' -f /data/zoo.cfg.dynamic ']'
+ ONDISK_DYN_CONFIG=true
+ set +e
+ [[ -n '' ]]
+ set -e
+ set +e
+ nslookup deployment-zookeeper-headless.objectstore-large.svc.cluster.local
Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	deployment-zookeeper-headless.objectstore-large.svc.cluster.local
Address: 192.168.55.175

+ [[ 0 -eq 0 ]]
+ ACTIVE_ENSEMBLE=true
+ [[ true == true ]]
+ [[ true == true ]]
+ WRITE_CONFIGURATION=false
+ [[ true == false ]]
+ [[ true == false ]]
+ [[ true == false ]]
+ REGISTER_NODE=false
+ [[ false == true ]]
+ [[ false == true ]]
+ ZOOCFGDIR=/data/conf
+ export ZOOCFGDIR
+ echo Copying /conf contents to writable directory, to support Zookeeper dynamic reconfiguration
Copying /conf contents to writable directory, to support Zookeeper dynamic reconfiguration
+ [[ ! -d /data/conf ]]
+ echo Copying the /conf/zoo.cfg contents except the dynamic config file during restart
Copying the /conf/zoo.cfg contents except the dynamic config file during restart
++ head -n -1 /conf/zoo.cfg
++ tail -n 1 /data/conf/zoo.cfg
+ echo -e '4lw.commands.whitelist=cons, envi, conf, crst, srvr, stat, mntr, ruok
dataDir=/data
standaloneEnabled=false
reconfigEnabled=true
skipACL=yes
metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
metricsProvider.httpPort=7000
metricsProvider.exportJvmInfo=true
initLimit=10
syncLimit=2
tickTime=2000
quorumListenOnAllIPs=false\ndynamicConfigFile=/data/zoo.cfg.dynamic'
+ cp -f /conf/log4j.properties /data/conf
+ cp -f /conf/log4j-quiet.properties /data/conf
+ cp -f /conf/env.sh /data/conf
+ '[' -f /data/zoo.cfg.dynamic ']'
Starting zookeeper service
+ echo Starting zookeeper service
+ zkServer.sh --config /data/conf start-foreground
ZooKeeper JMX enabled by default
Using config: /data/conf/zoo.cfg
2021-03-29 13:33:55,679 [myid:] - INFO  [main:QuorumPeerConfig@173] - Reading configuration from: /data/conf/zoo.cfg
2021-03-29 13:33:55,686 [myid:] - INFO  [main:QuorumPeerConfig@450] - clientPort is not set
2021-03-29 13:33:55,686 [myid:] - INFO  [main:QuorumPeerConfig@463] - secureClientPort is not set
2021-03-29 13:33:55,686 [myid:] - INFO  [main:QuorumPeerConfig@479] - observerMasterPort is not set
2021-03-29 13:33:55,744 [myid:] - INFO  [main:QuorumPeerConfig@496] - metricsProvider.className is org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
2021-03-29 13:33:55,753 [myid:] - WARN  [main:QuorumPeerConfig@727] - No server failure will be tolerated. You need at least 3 servers.
2021-03-29 13:33:55,758 [myid:2] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2021-03-29 13:33:55,758 [myid:2] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2021-03-29 13:33:55,758 [myid:2] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2021-03-29 13:33:55,762 [myid:2] - INFO  [main:ManagedUtil@44] - Log4j 1.2 jmx support found and enabled.
2021-03-29 13:33:55,771 [myid:2] - INFO  [main:QuorumPeerMain@151] - Starting quorum peer
2021-03-29 13:33:55,850 [myid:2] - INFO  [main:PrometheusMetricsProvider@74] - Initializing metrics, configuration: {exportJvmInfo=true, httpPort=7000}
2021-03-29 13:33:55,850 [myid:2] - INFO  [main:PrometheusMetricsProvider@82] - Starting /metrics HTTP endpoint at port 7000 exportJvmInfo: true
2021-03-29 13:33:55,895 [myid:2] - INFO  [main:Log@169] - Logging initialized @1300ms to org.eclipse.jetty.util.log.Slf4jLog
2021-03-29 13:33:56,180 [myid:2] - INFO  [main:Server@359] - jetty-9.4.24.v20191120; built: 2019-11-20T21:37:49.771Z; git: 363d5f2df3a8a28de40604320230664b9c793c16; jvm 11.0.8+10
2021-03-29 13:33:56,280 [myid:2] - INFO  [main:ContextHandler@825] - Started o.e.j.s.ServletContextHandler@69c81773{/,null,AVAILABLE}
2021-03-29 13:33:56,356 [myid:2] - INFO  [main:AbstractConnector@330] - Started ServerConnector@771a660{HTTP/1.1,[http/1.1]}{0.0.0.0:7000}
2021-03-29 13:33:56,356 [myid:2] - INFO  [main:Server@399] - Started @1767ms
2021-03-29 13:33:56,368 [myid:2] - INFO  [main:ServerMetrics@62] - ServerMetrics initialized with provider org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider@7d7758be
2021-03-29 13:33:56,449 [myid:2] - INFO  [main:QuorumPeer@752] - zookeeper.quorumCnxnTimeoutMs=-1
2021-03-29 13:33:56,463 [myid:2] - WARN  [main:ContextHandler@1520] - o.e.j.s.ServletContextHandler@a82c5f1{/,null,UNAVAILABLE} contextPath ends with /*
2021-03-29 13:33:56,463 [myid:2] - WARN  [main:ContextHandler@1531] - Empty contextPath
2021-03-29 13:33:56,467 [myid:2] - INFO  [main:X509Util@77] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2021-03-29 13:33:56,470 [myid:2] - INFO  [main:FileTxnSnapLog@124] - zookeeper.snapshot.trust.empty : false
2021-03-29 13:33:56,473 [myid:2] - INFO  [main:QuorumPeer@1680] - Local sessions disabled
2021-03-29 13:33:56,473 [myid:2] - INFO  [main:QuorumPeer@1691] - Local session upgrading disabled
2021-03-29 13:33:56,474 [myid:2] - INFO  [main:QuorumPeer@1658] - tickTime set to 2000
2021-03-29 13:33:56,474 [myid:2] - INFO  [main:QuorumPeer@1702] - minSessionTimeout set to 4000
2021-03-29 13:33:56,474 [myid:2] - INFO  [main:QuorumPeer@1713] - maxSessionTimeout set to 40000
2021-03-29 13:33:56,497 [myid:2] - INFO  [main:QuorumPeer@1738] - initLimit set to 10
2021-03-29 13:33:56,497 [myid:2] - INFO  [main:QuorumPeer@1920] - syncLimit set to 2
2021-03-29 13:33:56,497 [myid:2] - INFO  [main:QuorumPeer@1935] - connectToLearnerMasterLimit set to 0
2021-03-29 13:33:56,553 [myid:2] - INFO  [main:ZookeeperBanner@42] - 
2021-03-29 13:33:56,553 [myid:2] - INFO  [main:ZookeeperBanner@42] -   ______                  _                                          
2021-03-29 13:33:56,553 [myid:2] - INFO  [main:ZookeeperBanner@42] -  |___  /                 | |                                         
2021-03-29 13:33:56,553 [myid:2] - INFO  [main:ZookeeperBanner@42] -     / /    ___     ___   | | __   ___    ___   _ __     ___   _ __   
2021-03-29 13:33:56,553 [myid:2] - INFO  [main:ZookeeperBanner@42] -    / /    / _ \   / _ \  | |/ /  / _ \  / _ \ | '_ \   / _ \ | '__|
2021-03-29 13:33:56,553 [myid:2] - INFO  [main:ZookeeperBanner@42] -   / /__  | (_) | | (_) | |   <  |  __/ |  __/ | |_) | |  __/ | |    
2021-03-29 13:33:56,553 [myid:2] - INFO  [main:ZookeeperBanner@42] -  /_____|  \___/   \___/  |_|\_\  \___|  \___| | .__/   \___| |_|
2021-03-29 13:33:56,553 [myid:2] - INFO  [main:ZookeeperBanner@42] -                                               | |                     
2021-03-29 13:33:56,553 [myid:2] - INFO  [main:ZookeeperBanner@42] -                                               |_|                     
2021-03-29 13:33:56,554 [myid:2] - INFO  [main:ZookeeperBanner@42] - 
2021-03-29 13:33:56,555 [myid:2] - INFO  [main:Environment@98] - Server environment:zookeeper.version=3.6.1--104dcb3e3fb464b30c5186d229e00af9f332524b, built on 04/21/2020 15:01 GMT
2021-03-29 13:33:56,555 [myid:2] - INFO  [main:Environment@98] - Server environment:host.name=deployment-zookeeper-1.deployment-zookeeper-headless.objectstore-large.svc.cluster.local
2021-03-29 13:33:56,555 [myid:2] - INFO  [main:Environment@98] - Server environment:java.version=11.0.8
2021-03-29 13:33:56,555 [myid:2] - INFO  [main:Environment@98] - Server environment:java.vendor=N/A
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:java.home=/usr/local/openjdk-11
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:java.class.path=/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.6.1-bin/bin/../build/classes:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/target/lib/*.jar:/apache-zookeeper-3.6.1-bin/bin/../build/lib/*.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-prometheus-metrics-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-jute-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/snappy-java-1.1.7.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/slf4j-log4j12-1.7.25.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/slf4j-api-1.7.25.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_servlet-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_hotspot-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_common-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-native-unix-common-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-native-epoll-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-resolver-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-handler-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-common-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-codec-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-buffer-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/metrics-core-3.2.5.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/log4j-1.2.17.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/json-simple-1.1.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jline-2.11.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-util-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-servlet-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-server-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-security-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-io-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-http-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-databind-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-core-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-annotations-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/commons-lang-2.6.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/commons-cli-1.2.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/audience-annotations-0.5.0.jar:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-*.jar:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/src/main/resources/lib/*.jar:/data/conf:
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:java.io.tmpdir=/tmp
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:java.compiler=<NA>
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:os.name=Linux
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:os.arch=amd64
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:os.version=4.15.0-76-generic
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:user.name=root
2021-03-29 13:33:56,556 [myid:2] - INFO  [main:Environment@98] - Server environment:user.home=/root
2021-03-29 13:33:56,557 [myid:2] - INFO  [main:Environment@98] - Server environment:user.dir=/apache-zookeeper-3.6.1-bin
2021-03-29 13:33:56,557 [myid:2] - INFO  [main:Environment@98] - Server environment:os.memory.free=7MB
2021-03-29 13:33:56,557 [myid:2] - INFO  [main:Environment@98] - Server environment:os.memory.max=966MB
2021-03-29 13:33:56,557 [myid:2] - INFO  [main:Environment@98] - Server environment:os.memory.total=14MB
2021-03-29 13:33:56,557 [myid:2] - INFO  [main:ZooKeeperServer@128] - zookeeper.enableEagerACLCheck = false
2021-03-29 13:33:56,557 [myid:2] - INFO  [main:ZooKeeperServer@132] - zookeeper.skipACL=="yes", ACL checks will be skipped
2021-03-29 13:33:56,557 [myid:2] - INFO  [main:ZooKeeperServer@136] - zookeeper.digest.enabled = true
2021-03-29 13:33:56,557 [myid:2] - INFO  [main:ZooKeeperServer@140] - zookeeper.closeSessionTxn.enabled = true
2021-03-29 13:33:56,557 [myid:2] - INFO  [main:ZooKeeperServer@1434] - zookeeper.flushDelay=0
2021-03-29 13:33:56,558 [myid:2] - INFO  [main:ZooKeeperServer@1443] - zookeeper.maxWriteQueuePollTime=0
2021-03-29 13:33:56,558 [myid:2] - INFO  [main:ZooKeeperServer@1452] - zookeeper.maxBatchSize=1000
2021-03-29 13:33:56,558 [myid:2] - INFO  [main:ZooKeeperServer@241] - zookeeper.intBufferStartingSizeBytes = 1024
2021-03-29 13:33:56,563 [myid:2] - INFO  [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2021-03-29 13:33:56,563 [myid:2] - INFO  [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2021-03-29 13:33:56,565 [myid:2] - INFO  [main:ZKDatabase@132] - zookeeper.snapshotSizeFactor = 0.33
2021-03-29 13:33:56,565 [myid:2] - INFO  [main:ZKDatabase@152] - zookeeper.commitLogCount=500
2021-03-29 13:33:56,578 [myid:2] - INFO  [main:QuorumPeer@2001] - Using insecure (non-TLS) quorum communication
2021-03-29 13:33:56,578 [myid:2] - INFO  [main:QuorumPeer@2007] - Port unification disabled
2021-03-29 13:33:56,578 [myid:2] - INFO  [main:QuorumPeer@174] - multiAddress.enabled set to false
2021-03-29 13:33:56,578 [myid:2] - INFO  [main:QuorumPeer@199] - multiAddress.reachabilityCheckEnabled set to true
2021-03-29 13:33:56,578 [myid:2] - INFO  [main:QuorumPeer@186] - multiAddress.reachabilityCheckTimeoutMs set to 1000
2021-03-29 13:33:56,578 [myid:2] - INFO  [main:QuorumPeer@2461] - QuorumPeer communication is not secured! (SASL auth disabled)
2021-03-29 13:33:56,579 [myid:2] - INFO  [main:QuorumPeer@2486] - quorum.cnxn.threads.size set to 20
2021-03-29 13:33:56,585 [myid:2] - INFO  [main:AbstractConnector@380] - Stopped ServerConnector@771a660{HTTP/1.1,[http/1.1]}{0.0.0.0:7000}
2021-03-29 13:33:56,587 [myid:2] - INFO  [main:ContextHandler@1016] - Stopped o.e.j.s.ServletContextHandler@69c81773{/,null,UNAVAILABLE}
2021-03-29 13:33:56,645 [myid:2] - ERROR [main:QuorumPeerMain@113] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: My id 2 not in the peer list
	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1073)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
	at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
2021-03-29 13:33:56,647 [myid:2] - INFO  [main:ZKAuditProvider@42] - ZooKeeper audit is disabled.
2021-03-29 13:33:56,649 [myid:2] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 1

Zookeeper cluster CRD description

kubectl describe zookeepercluster deployment-zookeeper
Name:         deployment-zookeeper
Namespace:    objectstore-large
Labels:       app=zookeeper
              component=zk
              release=deployment
Annotations:  <none>
API Version:  zookeeper.pravega.io/v1beta1
Kind:         ZookeeperCluster
Metadata:
  Creation Timestamp:  2021-03-29T13:31:16Z
  Finalizers:
    cleanUpZookeeperPVC
  Generation:  3
    Manager:      objectscale-operator
    Operation:    Update
    Time:         2021-03-29T13:31:17Z
    API Version:  zookeeper.pravega.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
      f:spec:
        f:ports:
      f:status:
        .:
        f:conditions:
        f:externalClientEndpoint:
        f:internalClientEndpoint:
        f:members:
          .:
          f:ready:
          f:unready:
        f:readyReplicas:
        f:replicas:
    Manager:    zookeeper-operator
    Operation:  Update
    Time:       2021-03-29T14:50:03Z
  Resource Version:        10195029
  Self Link:               /apis/zookeeper.pravega.io/v1beta1/namespaces/objectstore-large/zookeeperclusters/deployment-zookeeper
  UID:                     903303a2-80e1-47bf-94a4-a4cf8cae3f3f
Spec:
  Admin Server Service:
  Client Service:
  Config:
    Auto Purge Purge Interval:     1
    Auto Purge Snap Retain Count:  3
    Commit Log Count:              500
    Global Outstanding Limit:      1000
    Init Limit:                    10
    Max Client Cnxns:              60
    Max Session Timeout:           40000
    Min Session Timeout:           4000
    Pre Alloc Size:                65536
    Snap Count:                    10000
    Snap Size Limit In Kb:         4194304
    Sync Limit:                    2
    Tick Time:                     2000
  Headless Service:
  Image:
    Pull Policy:  IfNotPresent
    Repository:   pravega/zookeeper
    Tag:          0.2.9
  Labels:
    App:        zookeeper
    Component:  zk
    Release:    deployment
  Persistence:
    Reclaim Policy:  Delete
    Spec:
      Access Modes:
        ReadWriteOnce
      Resources:
        Requests:
          Storage:         20Gi
      Storage Class Name:  csi-baremetal-sc-hddlvg
  Pod:
    Affinity:
      Pod Anti Affinity:
        Preferred During Scheduling Ignored During Execution:
          Pod Affinity Term:
            Label Selector:
              Match Expressions:
                Key:       app
                Operator:  In
                Values:
                  deployment-zookeeper
            Topology Key:  kubernetes.io/hostname
          Weight:          20
    Labels:
      App:      deployment-zookeeper
      Release:  deployment-zookeeper
    Resources:
      Limits:
        Cpu:     1
        Memory:  256M
      Requests:
        Cpu:                           500m
        Memory:                        128M
    Service Account Name:              default
    Termination Grace Period Seconds:  30
  Ports:
    Container Port:  9277
    Name:            client
    Container Port:  2888
    Name:            quorum
    Container Port:  3888
    Name:            leader-election
    Container Port:  7000
    Name:            metrics
    Container Port:  8080
    Name:            admin-server
  Probes:            <nil>
  Replicas:          3
  Storage Type:      persistence
Status:
  Conditions:
    Status:                  False
    Type:                    Upgrading
    Status:                  False
    Type:                    PodsReady
    Status:                  False
    Type:                    Error
  External Client Endpoint:  N/A
  Internal Client Endpoint:  10.102.68.149:9277
  Members:
    Ready:
      deployment-zookeeper-0
    Unready:
      deployment-zookeeper-1
  Ready Replicas:  1
  Replicas:        2
Events:            <none>

Previously we used ZK v0.2.7 and there was no this issue. Also I tried the fix described in issue #259 , but it didn’t helped.

Importance

Blocker issue. We need some fixes related to 0.2.9 version (https://github.com/pravega/zookeeper-operator/issues/257), so upgrade is required.

Location

(Where is the piece of code, package, or document affected by this issue?)

Suggestions for an improvement

(How do you suggest to fix or proceed with this issue?)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 31 (11 by maintainers)

Most upvoted comments

@iampranabroy sure, here we go:

We’ve two policies in place related to solr/zookeeper. One (a) to allow traffic between the zookeeper members itself (z<->z) and another one (b) to allow traffic from solr to the zookeeper (s->z) pods. Note: we do block any egress traffic from all pods by default and are following the “efault-deny-all-egress-traffic” principle. If you’re doing it vice-versa, eg. blocking ingress traffic, you need to change the policies accordingly. Furthermore, the solr-instances accessing your zookeeper pods needs to have the custom label allow-zookeeper-access: "true" set.

a)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-zookeeper-access-zookeeper
spec:
  egress:
  - ports:
    - port: 2181
      protocol: TCP
    - port: 2888
      protocol: TCP
    - port: 3888
      protocol: TCP
    - port: 7000
      protocol: TCP
    - port: 8080
      protocol: TCP
    to:
    - podSelector:
        matchLabels:
          kind: ZookeeperMember
          technology: zookeeper
  podSelector:
    matchLabels:
      kind: ZookeeperMember
      technology: zookeeper
  policyTypes:
  - Egress

b)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-solr-access-zookeeper
spec:
  egress:
  - ports:
    - port: 2181
      protocol: TCP
    - port: 7000
      protocol: TCP
    - port: 8080
      protocol: TCP
    to:
    - podSelector:
        matchLabels:
          kind: ZookeeperMember
          technology: zookeeper
  podSelector:
    matchLabels:
      allow-zookeeper-access: "true"
  policyTypes:
  - Egress

Good luck 🤞