druid: Kafka indexing service duplicate entry exception in druid_pendingSegments

After upgrading to Druid 0.16.0-incubating, I am receiving a MySQLIntegrityConstraintViolationException complaining about:

“Duplicate entry XXX for key ‘PRIMARY’ [statement:"INSERT INTO druid_pendingSegments (id, dataSource…”

This results in the Kafka indexing tasks not being able to complete and the eventual failure of the coordinator/overlord nodes. This scenario only seems to happen after I drop some segments from Druid and then push in new data for the time period which was dropped. The only way I have found to fix this has been to force stop all of my Kafka indexing supervisors & tasks and manually delete all of the entires in the druid_pendingSegments table. After I do that, I no longer receive the sql exception and corresponding duplicate entry error message. Any thoughts on this would be greatly appreciated!

How to Reproduce:

  1. Suspend a Kafka indexing supervisor for a given data source and wait for the indexing task(s) to complete.
  2. Drop segments for a certain time period from the given data source and wait for the segments to be unloaded from the historical nodes.
  3. Resume the Kafka indexing supervisor for a given data source.
  4. Push new data through Kafka for the same time period which was previously dropped on the given data source.
  5. Check the indexing logs for the Kafka indexing tasks to see them complaining about duplicate primary key errors.

Other Notables:

  1. The druid_pendingSegments table doesn’t seem to get cleaned up once a Kafka indexing supervisor is suspended. Entries are still left in this table for the given data source despite all the segments having been published to deepstorage / historical nodes. I do have druid.coordinator.kill.pendingSegments.on=true enabled. Maybe this is normal?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 25 (13 by maintainers)

Most upvoted comments

At least in 0.17.0 you can delete entries from that table by using overlord API:

curl -X DELETE -H ‘Accept: application/json, text/plain, */*’ http://[yourhost]:[yourport]/druid/indexer/v1/pendingSegments/[datasource]?interval=1000/3000