kminion: Fails to start if any seed broker is offline
This is part of the ongoing saga of figuring out what’s missing to make kminion able once more to operate with a kafka cluster where one or more brokers may be offline at any time.
Starting simple, here’s an immediate crash if any of the seed brokers are offline at startup:
{"level":"info","ts":"2021-06-01T06:30:58.438Z","msg":"connecting to Kafka seed brokers, trying to fetch cluster metadata","seed_brokers":"kafka1:9092,kafka2:9092,kafka3:9092"}
{"level":"warn","ts":"2021-06-01T06:31:01.521Z","msg":"unable to open connection to broker","source":"kafka_client","addr":"kafka-s201:9092","broker":-2147483647,"err":"dial tcp xxx.xxx.xxx.xxx:9092: connect: no route to host"}
{"level":"fatal","ts":"2021-06-01T06:31:01.521Z","msg":"failed to test connectivity to Kafka cluster","error":"failed to request api versions: unable to dial: dial tcp xxx.xxx.xxx.xxx:9092: connect: no route to host"}
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 25 (10 by maintainers)
Commits related to this issue
- allow any *os.SyscallError to be retriable See embedded comment. Fixes cloudhut/kminion#93, as well as other issues. — committed to twmb/franz-go by twmb 3 years ago
Let me include some fuller logs then. In this case, it’s the second broker in the list that is down (conveniently took these logs from the staging cluster where node 2 is currently down. Happy to test some other case if need be.)