enqueue-dev: [kafka] RdKafkaProducer has no way of error handling
When an error occurs in RdKafkaProducer
, it is silently ignored - or, more accurately, since it occurs outside of the main thread of php/php-fpm, it is not reported back.
A simple test can be done to see this issue: trying to send a message to a non-existent Kafka server will result in nothing (see point 2. though)
This results in two things:
- Process has no way of knowing, that a message that was supposed to be delivered will be not. Producer returns immediately without waiting for message to be acknowledged.
- Due to how it is handled in arnaud-lb/php-rdkafka, process that was supposed to send the message will be locked and retry the operation for a long time (from my testing it seems around 5 mins, wasn’t able to find a configuration option to change it to something else). In my case with default configuration of php-fpm docker image it resulted in fpm pool becoming locked after 5 requests being made, since thats the default configuration for max spawned children of php-fpm.
While this particular part is not really possible to fix inside enqueue, it’s important since the message might actually be delivered later on.
I’d like to ask for opinion regarding how RdKafkaProducer should handle this situation. IMO it is worthwile to add a configuration option to make sending messages synchronous for this particular Producer - or at least wait a specified amount of time for message to be potentially acknowledged.
This can be done by introducing this code at the end of send
method:
$topic = $this->producer->newTopic($destination->getTopicName(), $destination->getConf());
$topic->produce($partition, 0 /* must be 0 */, $payload, $key);
$start = microtime(true);
while ($this->producer->getOutQLen() > 0) {
$this->producer->poll(1);
if (microtime(true) - $start > 10) {
throw new \RuntimeException("Message sending failed");
}
}
This has a side effect of actually calling dr_msg_cb
& error_cb
callbacks, which are otherwise ignored (or at least that’s what my testing indicated).
Thoughts?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 30 (24 by maintainers)
Be silent, bot! I’ll work on it. I promise! 😃
I would introduce a method to get errors and leave it up to developers to use it or not. Or set a flag that enabled pooling on publishing.
@makasim Yes, will do. Just wanted to clarify what approach I should take before doing anything 😃