kafkajs: Connection error to kafka hosted on confluent cloud with v1.14
Describe the bug Bumping kafkajs from 1.12.0 to 1.14.0 breaks with a kafka hosted on confluent cloud.
** Details ** Application is a simple producer, but fail when connecting to the broker.
Producer is connecting like this :
const conf = {
clientId: 'backend',
brokers: KAFKA.bootstrapServers,
ssl: true,
sasl: {
mechanism: 'plain' as SASLMechanism,
username: ENV.CONFLUENT_KAFKA_API_KEY,
password: ENV.CONFLUENT_KAFKA_API_SECRET,
},
};
const kafka = new Kafka(conf);
const producer = kafka.producer({
createPartitioner: Partitioners.JavaCompatiblePartitioner,
});
await producer.connect();
And fails with:
2020-10-20T09:36:22.149Z 792bad59-e2e3-43de-a089-46b2b98bf354 ERROR {"level":"ERROR","timestamp":"2020-10-20T09:36:22.149Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"pkc-lq8v7.eu-central-1.aws.confluent.cloud:9092","clientId":"backend"}
2020-10-20T09:36:22.150Z 792bad59-e2e3-43de-a089-46b2b98bf354 ERROR {"level":"ERROR","timestamp":"2020-10-20T09:36:22.150Z","logger":"kafkajs","message":"[BrokerPool] Failed to connect to seed broker, trying another broker from the list: Connection timeout","retryCount":0,"retryTime":196}
// etc ... until max retry
Environment:
- env: lambda in AWS (amz linux)
- KafkaJS 1.14.0
- Kafka 2.6.0 (confluent v6.0.0)
- NodeJS 12.13.1
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 24 (2 by maintainers)
Hey there, have you tried to increase the timeout on the broker config?
Here on my tests it worked, a bit shaky but it works.
I’m going to close this issue, since this appears to not be an issue with KafkaJS, but feel free to keep adding updates in case there is some information from Confluent regarding this issue.
Added some more debug output by running with
NODE_DEBUG=tlsas well as a custom socket factory with tracing enabled on the socket.“alert number 70” from openssl is: “The protocol version the client attempted to negotiate is recognized, but not supported. For example, old protocol versions might be avoided for security reasons. This message is always fatal.”
If I’m reading the output right, it looks like the node process is saying that it wants to use TLS 1.2, which according to Confluent Clouds’ documentation is supported. https://docs.confluent.io/current/cloud/faq.html#:~:text=Security Controls whitepaper.-,What version of TLS is supported on Confluent Cloud,TLS version 1.2 is supported.&text=Effective March 15%2C 2020%2C connections,1.1 are no longer supported.
I’m not entirely sure. The output format is from openssl’s
traceformat, but it needs to be compiled withenable-ssl-tracefor it to work, so I’m looking to see if there’s a pre-compiled binary with that that I can try with to see what the output looks like from there. EDIT: Now I’m compiling openssl myself. What a fun day this turned out to be.My admittedly very basic understanding is that the initial handshake is done with TLS 1.0 but containing the supported TLS versions on the client, which the server and client then use to negotiate which protocol version to use:
We got the same issue since our migration to confluent cloud, we are facing troubles in many clients which return kafka timeout error. We have never got these errors before the migration. The errors appear randomly sometime at the beginning or while the service is running Below example of errors:
Just increase timeout didn’t fix on our side. Need clear up-to-date