strimzi-kafka-operator: [Question] Zookeeper failed to verify host address after upgraded to 0.18.0

We had been running a Kafka cluster in an base metal K8s with following details:

  • 3 zookeeper: lab-zookeeper-0/1/2
  • 3 brokers: lab-kafka-0/1/2
  • cluster operator version: 0.17.0
  • K8s namespace: kafka-lab
  • strimzi cluster name: lab

After we upgraded cluster operator to 0.18.0, zookeepers and kafka got automatically rolled updated to pick up new image (strimzi/kafka:0.18.0-kafka-2.4.0), and everything looked normal (at least from kubectl get po). However, when I get logs from one of zookeeper pod, we have seen a lot of Failed to verify hostname errors detailed as following:

2020-05-27 00:07:57,861 ERROR Failed to verify host address: 10.244.180.244 (org.apache.zookeeper.common.ZKTrustManager) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10.244.180.244> doesn’t match any of the subject alternative names: [lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc.cluster.local, lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc.cluster.local, lab-zookeeper-client, lab-zookeeper-client.kafka-lab] at org.apache.zookeeper.common.ZKHostnameVerifier.matchIPAddress(ZKHostnameVerifier.java:194) at org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:164) at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:135) at org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:74) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2037) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:233) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082) at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416) at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2309) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:273) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:301) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:180) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:700) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:694) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:524) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:478) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:934) 2020-05-27 00:07:57,861 ERROR Failed to verify hostname: 10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local (org.apache.zookeeper.common.ZKTrustManager) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local> doesn’t match any of the subject alternative names: [lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc.cluster.local, lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc.cluster.local, lab-zookeeper-client, lab-zookeeper-client.kafka-lab] at org.apache.zookeeper.common.ZKHostnameVerifier.matchDNSName(ZKHostnameVerifier.java:224) at org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:170) at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:141) at org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:74) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2037) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:233) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082) at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416) at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2309) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:273) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:301) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:180) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:700) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:694) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:524) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:478) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:934) 2020-05-27 00:07:57,861 INFO Accepted TLS connection from 10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local/10.244.180.244:54086 - NONE - SSL_NULL_WITH_NULL_NULL (org.apache.zookeeper.server.quorum.UnifiedServerSocket) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] 2020-05-27 00:07:57,861 WARN Exception reading or writing challenge: {} (org.apache.zookeeper.server.quorum.QuorumCnxManager) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Failed to verify both host address and host name at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1554) at sun.security.ssl.AppInputStream.read(AppInputStream.java:95) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:694) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:524) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:478) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:934) Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Failed to verify both host address and host name at sun.security.ssl.Alerts.getSSLException(Alerts.java:198) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1967) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:331) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:325) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2055) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:233) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082) at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416) at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2309) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:273) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:301) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:180) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:700) … 9 more Caused by: java.security.cert.CertificateException: Failed to verify both host address and host name at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:145) at org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:74) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2037) … 20 more Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local> doesn’t match any of the subject alternative names: [lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc.cluster.local, lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc.cluster.local, lab-zookeeper-client, lab-zookeeper-client.kafka-lab] at org.apache.zookeeper.common.ZKHostnameVerifier.matchDNSName(ZKHostnameVerifier.java:224) at org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:170) at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:141) … 22 more 2020-05-27 00:07:58,366 INFO Authenticated Id ‘CN=lab-kafka,O=io.strimzi’ for Scheme ‘x509’ (org.apache.zookeeper.server.auth.X509AuthenticationProvider) [nioEventLoopGroup-7-2] 2020-05-27 00:07:58,367 WARN Closing connection to /10.244.65.25:41740 (org.apache.zookeeper.server.NettyServerCnxn) [nioEventLoopGroup-7-2] java.io.IOException: ZK down at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:474) at org.apache.zookeeper.server.NettyServerCnxn.processMessage(NettyServerCnxn.java:360) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelRead(NettyServerCnxnFactory.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1470) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) 2020-05-27 00:07:58,853 INFO Authenticated Id ‘CN=cluster-operator,O=io.strimzi’ for Scheme ‘x509’ (org.apache.zookeeper.server.auth.X509AuthenticationProvider) [nioEventLoopGroup-7-3] 2020-05-27 00:07:58,853 WARN Closing connection to /10.244.69.241:53394 (org.apache.zookeeper.server.NettyServerCnxn) [nioEventLoopGroup-7-3] java.io.IOException: ZK down at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:474) at org.apache.zookeeper.server.NettyServerCnxn.processMessage(NettyServerCnxn.java:360) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelRead(NettyServerCnxnFactory.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1470) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)

What I got most confused is the log seems: 1 verifying against the host address 10.244.180.244 (the K8s internal ip) of a zookeeper peer pod, which failed because the cert doesn’t cover that ip 2. then trying to verify hostname: 10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local, which is essentially combining the pod ip and client service.

I guess that’s related to the migration from tls sidecar for zookeepr to the built-in tls support. Would really appreciate any help.

And here is the zookeeper section of Kafka manifest:

  zookeeper:
    replicas: 3
    resources:
      requests:
        memory: 6Gi
        cpu: "2"
      limits:
        memory: 6Gi
        cpu: "2"
    jvmOptions:
      -Xms: 3072m
      -Xmx: 3072m
    storage:
      type: persistent-claim
      size: 20Gi
      class: ssd
      deleteClaim: false

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 32 (14 by maintainers)

Commits related to this issue

Most upvoted comments

@Escaflow I’m very confused now. Are you working with @oulydna and talking about the same issue? Or do you have your own spearate issue which you think is related? The issue here in the previous logs is Received fatal alert: certificate_unknown - so that is IMHO something what happens before the hostname verification.

If you have your own cluster with the hostname verification issue, can you share the logs? You can use this script to collect them into. ZIP archive: https://github.com/strimzi/strimzi-kafka-operator/blob/master/tools/report.sh