pulsar: Broker can not respond to client requests

Describe the bug

One of the brokers in our Pulsar cluster suddenly hung up. The broker process was alive, but it seems that it could not respond to requests from clients.

At that time, the following errors occurred on the client side:

02:35:08.736 [pulsar-client-io-5-8] WARN  o.a.p.c.i.BinaryProtoLookupService   - [persistent://xxx/global/xxx/xxx-partition-7] failed to send lookup request : org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 9139700 lookup request timedout after ms 30000
java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 9139700 lookup request timedout after ms 30000
02:35:17.739 [pulsar-client-io-5-8] WARN  o.a.pulsar.common.api.PulsarHandler  - [[id: 0x678cbfd2, L:/xxx.xxx.xxx.xxx:36338 - R:f1c2-broker103.pulsar.xxx.yahoo.co.jp/xxx.xxx.xxx.xxx:6651]] Forcing connection to close after keep-alive timeout
02:35:17.748 [pulsar-client-io-5-8] WARN  o.a.pulsar.client.impl.ConsumerImpl  - [persistent://xxx/global/xxx/xxx][sub1] Failed to subscribe to topic on f1c2-broker103.pulsar.xxx.yahoo.co.jp/xxx.xxx.xxx.xxx:6651
02:35:33.951 [pulsar-client-io-5-8] WARN  o.a.pulsar.client.impl.ClientCnx     - Error during handshake
javax.net.ssl.SSLException: handshake timed out
        at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty-all-4.1.22.Final.jar:4.1.22.Final]
02:35:33.952 [pulsar-client-io-5-8] WARN  o.a.p.client.impl.ConnectionPool     - [[id: 0x208e5fb2, L:/xxx.xxx.xxx.xxx:60874 ! R:f1c2-broker103.pulsar.xxx.yahoo.co.jp/xxx.xxx.xxx.xxx:6651]] Connection handshake failed: org.apache.pulsar.client.api.PulsarClientException: Connection already closed
02:35:33.952 [pulsar-client-io-5-8] WARN  o.a.p.client.impl.ConnectionHandler  - [persistent://xxx/global/xxx/xxx] [xxx-497-2225404] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException: Connection already closed
02:35:33.952 [pulsar-client-io-5-8] WARN  o.a.p.client.impl.ConnectionHandler  - [persistent://xxx/global/xxx/xxx] [xxx-497-2225404] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException: Connection already closed -- Will try again in 0.188 s

That broker returned to normal after restarting, and the errors no longer occur on the client side. The load of that broker was not high, so I think that there is a bug in broker code.

Additional context

Broker OS: CentOS Linux release 7.6.1810 Broker version: 2.2.1 Broker spec: Real server / 2.10GHz / 2CPU / 256GBMEM / SATA SSD 240GB x1 / 10G Base-T*2port

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 26 (22 by maintainers)

Most upvoted comments

@wolfstudy @rdhabalia let’s create a fix on branch-2.6 and create a 2.6.3 release.