pulsar: Deduplication causes a lot of Full GCs
Describe the bug
2019/02/14 Added
In our experiments, we found that enabling deduplication causes a lot of Full GCs in Brokers, which seems to cause session expiration from ZooKeepers and finally shutdown.
2019/02/07 Original report of unexpected Broker shutdown
We have seen unexpected Broker shutdown.
- There were LedgerFencedExceptions for a lot of ledgers:
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN o.a.bookkeeper.client.PendingAddOp - Fencing exception on write: L9104171 E28233 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN o.a.bookkeeper.client.PendingAddOp - Fencing exception on write: L9104171 E28234 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] WARN o.a.bookkeeper.client.PendingAddOp - Fencing exception on write: L9104171 E28235 on xxx.xx.xx.xx:3181
01:47:09.430 [BookKeeperClientWorker-OrderedExecutor-43-0] ERROR o.a.bookkeeper.client.LedgerHandle - Closing ledger 9104171 due to LedgerFencedException: Ledger has been fenced off. Some other client must have opened it to read
...
- There were a lot of “Failed to create producer: Producer with name geo-replicator”.
01:47:09.907 [pulsar-io-21-31] ERROR o.a.pulsar.client.impl.ProducerImpl - [persistent://<topicname>] [pulsar.repl.<localcluster>] Failed to create producer: Producer with name 'pulsar.repl.<localcluster>' is already connected to topic
- Finally, Broker suddenly stopped with
01:47:09.963 [pulsar-ordered-OrderedExecutor-4-0-EventThread] ERROR o.a.p.z.ZooKeeperSessionWatcher - ZooKeeper session already expired, invoking shutdown
Additional context Broker OS: CentOS Linux release 7.6.1810 Broker version: 2.1.1
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 19 (19 by maintainers)
As @hrsakai pointed out, the fix was ineffective because applied on a code path that’s not being used.
The problem is that that while the cursor is set as “inactive” in the beginning, a periodic check is flipping back the state to “active”:
https://github.com/apache/pulsar/blob/43380523c5269c152f61b2aa8f7b70281c770d1d/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L878-L885
Working on a fix.