kafka-python: Help: Error sending GroupCoordinatorRequest_v0 to node
Upgraded to kafka-python 1.2.2 running against a 3-node Kafka 0.10 cluster. We’re receiving this error once on every subscribe action:
ERROR:kafka.coordinator:Error sending GroupCoordinatorRequest_v0 to node XXX [XXX]
The node specified appears to be random.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 18 (11 by maintainers)
Found the root cause of the problem. The reason for the issue was that the session timeout was larger than the request timeout. Whenever we would restart a process, the previous process’ session would live for too long relative to the request timeout, leading to the request timeout error.
When looking at the standard java based kafka client, there is actually an exception being thrown when the session timeout is larger than the request timeout, or when the fetch-max-wait is larger than the request timeout. See code Here.
I’ve created a pull request that throws an error if these constraints are violated #986 . Ran the tests manually through pytest tough, since tox stuff wasn’t working for me. Send me any comments on it if needed.
Thanks Harel
Joining @harelba - It happened to me and was a nightmare to solve until I understood the problem, there is no sense in having session_timeout_ms > request_timeout_ms by definition.
merged harelba’s PR to fail fast if these values are mis-configured. Thanks for the debugging!
@dpkp , I have resolved this problem by setting request_timeout_ms to 300000 insttead of 40000 by default. So I think the reason is I fetch too many messages from kafka broker one time, and after I processed all of them, call consumer.poll() again, it’s beyond request_timeout_ms. What do you think? If you don’t agree me or you can not certain, I will attach more information here to help you analyze.