pulsar: Deduplication causes a lot of Full GCs

Describe the bug

2019/02/14 Added

In our experiments, we found that enabling deduplication causes a lot of Full GCs in Brokers, which seems to cause session expiration from ZooKeepers and finally shutdown.

2019/02/07 Original report of unexpected Broker shutdown

We have seen unexpected Broker shutdown.

  1. There were LedgerFencedExceptions for a lot of ledgers:
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN  o.a.bookkeeper.client.PendingAddOp   - Fencing exception on write: L9104171 E28233 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle   - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN  o.a.bookkeeper.client.PendingAddOp   - Fencing exception on write: L9104171 E28234 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle   - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN  o.a.bookkeeper.client.PendingAddOp   - Fencing exception on write: L9104171 E28235 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle   - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
...
  1. There were a lot of “Failed to create producer: Producer with name geo-replicator”.
01:47:09.907 [pulsar-io-21-31] ERROR o.a.pulsar.client.impl.ProducerImpl  - [persistent://<topicname>] [pulsar.repl.<localcluster>] Failed to create producer: Producer with name 'pulsar.repl.<localcluster>' is already connected to topic
  1. Finally, Broker suddenly stopped with
01:47:09.963 [pulsar-ordered-OrderedExecutor-4-0-EventThread] ERROR o.a.p.z.ZooKeeperSessionWatcher      - ZooKeeper session already expired, invoking shutdown

Additional context Broker OS: CentOS Linux release 7.6.1810 Broker version: 2.1.1

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (19 by maintainers)

Most upvoted comments

As @hrsakai pointed out, the fix was ineffective because applied on a code path that’s not being used.

The problem is that that while the cursor is set as “inactive” in the beginning, a periodic check is flipping back the state to “active”:

https://github.com/apache/pulsar/blob/43380523c5269c152f61b2aa8f7b70281c770d1d/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L878-L885

Working on a fix.