clickhouse-js: Socket hang up after 2nd query
Get this error after 2nd query to clickhouse
Error: socket hang up
at connResetException (node:internal/errors:691:14)
at Socket.socketOnEnd (node:_http_client:471:23)
at Socket.emit (node:events:402:35)
at endReadableNT (node:internal/streams/readable:1343:12)
at processTicksAndRejections (node:internal/process/task_queues:83:21) {
code: 'ECONNRESET'
}
Steps to reproduce
- Make 1 query with results
- Make some actions with stat less than 7-10 sec
- After that make one more query and get this error
Also, try to increase connect_timeout to 20 sec but it didn’t help.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 6
- Comments: 56 (27 by maintainers)
Closing this, as it should be fixed as of 0.3.0. If there are still issues with sockets after upgrading, please open a new one.
Hi, my fix for timeout case was related to nodejs 19, I was able to fix it just because it was reproducible. But we still see completely random
socket hang
up issue. In our case we used Debian 11 + nodejs 18 + clickhouse 23.1 in production. now we are migrating to Alpine 3.17 + nodejs 19 + clickhouse 23.3 and still see 1-2 hang issues per day So far I was not able to identify the cause, but I see numerous reported issues in nodejs https://github.com/nodejs/node/issues/39810 https://github.com/nodejs/node/issues/47228Maybe we should use undici for clickhouse client…
I have the same issue, the client trow after some time
Error: socket hang up
I tried all settings recommended here, but the issue persists…0.1.1 is out.
It introduces a new feature to track potentially expired sockets.
Here’s an excerpt from the docs:
If you are experiencing
socket hang up
errors, there are several options to resolve this issue:Increase the value of the
keep_alive_timeout
server setting (config.xml
), as it could be as little as 3s by default. This could help if your application idles for slightly more than the default server setting. However, it is not always possible to increase it (for example, no access to the server’sconfig.xml
), and this setting shouldn’t be increased to unreasonable values, and even then, a particular request can happen at an unfortunate time. Expired socket detection feature can help in such situations.Enable expired socket detection and retry mechanism in the client:
If a potentially expired socket is detected (more than
socket_ttl
since that idle socket was used), and retry is enabled in the configuration, both socket and request will be immediately destroyed (before sending the data), and the client will recreate the request. Note thatsocket_ttl
should be slightly less than the serverkeep_alive_timeout
setting to make it work. Ifsocket_ttl
is configured appropriately, it should reliably resolvesocket hang up
issues.@GGo3 @olexiyb @nlapshin could you test if
socket_ttl
+retry_on_expired_socket
configuration will work for you?@kobi-co, yes, that’s what I was thinking as well - the request stream is not destroyed properly. Thanks for the report!
@movy, you just need to recreate the failed ingestion promise for a particular table in case of an error. In your snippet, I see that you recreate the client (without closing the previous instance), but I don’t think you need to do it at all.
@movy, that warning is for multiprocessing scenario - Node.js is a single-threaded runtime. When you redeploy your app with KeepAlive disabled, please confirm if the issue is resolved/persists for your scenario.
@movy, thanks for sharing. For this particular use case, I’d just disable
keep_alive
as it does not add much here (aside from the issues) - all the sockets are actively used until the end of the application and not idling (except for one socket assigned tocommand
during the init).But if even disabled
keep_alive
causes issues… That is unexpected. I will set up a simple application derived from your example to do the background ingestion to see what happens in the long run vs CH Cloud.In the Node.js docs, there is a proposed “workaround”, which I was reluctant to include in the library core: https://nodejs.org/api/http.html#requestreusedsocket They suggest just silently retrying in the event of a hanged socket if it’s reused, which is what our Python client actually does.
Regarding your code snippet and ECONNRESET: the client will not reconnect or do any retries (apart from when we detect an expired socket in advance using
retry_on_expired_socket
; if an error will pop up after the data is sent, it will just fail). I see that in the shared code, there is no recreation of a particular stream in case it’s failed, only error logging. And you mentioned a dozen of such errors per day? Do you restart the application often? Cause effectively, I see that only 3 sockets are in use here with no retries (e.g. they will just be closed, and that’s it).One more question: how often are inserts triggered? is it a constant stream or with some idling sometimes?
@movy, I agree that the KeepAlive topic is rather confusing, especially in Node.js with all the different timeouts and lack of proper socket housekeeping out of the box, etc.
Can you please share a snippet of your code that was derived from the endless stream example? Node.js/OS versions will also help.
I am curious how a hang-up error can happen when basically one connection is open indefinitely.
But the fact that
keep_alive: {enabled: false}
did not help as well is rather surprising.Usually, what happens is:
agentkeepalive
library withfreeSocketTimeout
setting does not help), they are still considered “usable” even if the server has already closed the connection on its sidesocket hang up
cause the remote side is shut down.But I do not expect this to be happening without KeepAlive enabled.
Additionally, can you please confirm that this:
does not help either? The entire idea here was to hack around the janky sockets management by trying to reliably detect an expiring one in advance. If even this does not help, we need to search for another solution (again).
@gabibora, please share the same info as requested above (Node.js version, OS, a minimal code snippet).
The problem is still presented. Update version from 0.0.14 to 0.1.0. Also, try to set connect_timeout: 5_000, request_timeout: 60_000
Maybe need to test on the latest CH version, because we test it on 21.7.4.18