clickhouse-js: Socket hang up after 2nd query

Get this error after 2nd query to clickhouse

Error: socket hang up
    at connResetException (node:internal/errors:691:14)
    at Socket.socketOnEnd (node:_http_client:471:23)
    at Socket.emit (node:events:402:35)
    at endReadableNT (node:internal/streams/readable:1343:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  code: 'ECONNRESET'
}

Steps to reproduce

Make 1 query with results
Make some actions with stat less than 7-10 sec
After that make one more query and get this error

Also, try to increase connect_timeout to 20 sec but it didn’t help.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 6
Comments: 56 (27 by maintainers)

Most upvoted comments

Closing this, as it should be fixed as of 0.3.0. If there are still issues with sockets after upgrading, please open a new one.

slvrtrn on Mar 18, 2024

Hi, my fix for timeout case was related to nodejs 19, I was able to fix it just because it was reproducible. But we still see completely random socket hang up issue. In our case we used Debian 11 + nodejs 18 + clickhouse 23.1 in production. now we are migrating to Alpine 3.17 + nodejs 19 + clickhouse 23.3 and still see 1-2 hang issues per day So far I was not able to identify the cause, but I see numerous reported issues in nodejs https://github.com/nodejs/node/issues/39810 https://github.com/nodejs/node/issues/47228

Maybe we should use undici for clickhouse client…

olexiyb on Apr 27, 2023

I have the same issue, the client trow after some time Error: socket hang up I tried all settings recommended here, but the issue persists…

gabibora on Jul 11, 2023

0.1.1 is out.

It introduces a new feature to track potentially expired sockets.

Here’s an excerpt from the docs:

If you are experiencing socket hang up errors, there are several options to resolve this issue:

Increase the value of the keep_alive_timeout server setting (config.xml), as it could be as little as 3s by default. This could help if your application idles for slightly more than the default server setting. However, it is not always possible to increase it (for example, no access to the server’s config.xml), and this setting shouldn’t be increased to unreasonable values, and even then, a particular request can happen at an unfortunate time. Expired socket detection feature can help in such situations.
Enable expired socket detection and retry mechanism in the client:

const client = createClient({
  keep_alive: {
    enabled: true,
    // should be slightly less than the `keep_alive_timeout` setting in the server's `config.xml`
    // default is 3s there, so 2500 milliseconds seems to be a safe client value in this scenario
    // another example: if your configuration has `keep_alive_timeout` set to 60s, you could put 59_000 here
    socket_ttl: 2500,
    retry_on_expired_socket: true,
  },
})

If a potentially expired socket is detected (more than socket_ttl since that idle socket was used), and retry is enabled in the configuration, both socket and request will be immediately destroyed (before sending the data), and the client will recreate the request. Note that socket_ttl should be slightly less than the server keep_alive_timeout setting to make it work. If socket_ttl is configured appropriately, it should reliably resolve socket hang up issues.

As a last resort, it is possible to disable Keep-Alive feature entirely:

const client = createClient({
  keep_alive: {
    enabled: false,
  },
})

@GGo3 @olexiyb @nlapshin could you test if socket_ttl + retry_on_expired_socket configuration will work for you?

slvrtrn on Jul 3, 2023

@kobi-co, yes, that’s what I was thinking as well - the request stream is not destroyed properly. Thanks for the report!

slvrtrn on Sep 4, 2023

@movy, you just need to recreate the failed ingestion promise for a particular table in case of an error. In your snippet, I see that you recreate the client (without closing the previous instance), but I don’t think you need to do it at all.

slvrtrn on Jul 18, 2023

@movy, that warning is for multiprocessing scenario - Node.js is a single-threaded runtime. When you redeploy your app with KeepAlive disabled, please confirm if the issue is resolved/persists for your scenario.

slvrtrn on Jul 13, 2023

@movy, thanks for sharing. For this particular use case, I’d just disable keep_alive as it does not add much here (aside from the issues) - all the sockets are actively used until the end of the application and not idling (except for one socket assigned to command during the init).

But if even disabled keep_alive causes issues… That is unexpected. I will set up a simple application derived from your example to do the background ingestion to see what happens in the long run vs CH Cloud.

In the Node.js docs, there is a proposed “workaround”, which I was reluctant to include in the library core: https://nodejs.org/api/http.html#requestreusedsocket They suggest just silently retrying in the event of a hanged socket if it’s reused, which is what our Python client actually does.

Regarding your code snippet and ECONNRESET: the client will not reconnect or do any retries (apart from when we detect an expired socket in advance using retry_on_expired_socket; if an error will pop up after the data is sent, it will just fail). I see that in the shared code, there is no recreation of a particular stream in case it’s failed, only error logging. And you mentioned a dozen of such errors per day? Do you restart the application often? Cause effectively, I see that only 3 sockets are in use here with no retries (e.g. they will just be closed, and that’s it).

One more question: how often are inserts triggered? is it a constant stream or with some idling sometimes?

slvrtrn on Jul 12, 2023

@movy, I agree that the KeepAlive topic is rather confusing, especially in Node.js with all the different timeouts and lack of proper socket housekeeping out of the box, etc.

Can you please share a snippet of your code that was derived from the endless stream example? Node.js/OS versions will also help.

I am curious how a hang-up error can happen when basically one connection is open indefinitely.

But the fact that keep_alive: {enabled: false} did not help as well is rather surprising.

Usually, what happens is:

With KeepAlive enabled, there is a pool of open sockets that HTTP Agent “manages”.
Since there is no reliable way to terminate idle sockets out of the box (even agentkeepalive library with freeSocketTimeout setting does not help), they are still considered “usable” even if the server has already closed the connection on its side
when such an “expired” socket is assigned from the pool to a new request, and we try to send something, we have socket hang up cause the remote side is shut down.
without KeepAlive, the sockets should be connected/reconnected every time we create a new request, so it should theoretically bypass the problem entirely at a cost of some performance.

But I do not expect this to be happening without KeepAlive enabled.

Additionally, can you please confirm that this:

const client = createClient({
  keep_alive: {
    enabled: true,
    socket_ttl: 2500, // less than the server's KeepAlive timeout
    retry_on_expired_socket: true,
  },
})

does not help either? The entire idea here was to hack around the janky sockets management by trying to reliably detect an expiring one in advance. If even this does not help, we need to search for another solution (again).

@gabibora, please share the same info as requested above (Node.js version, OS, a minimal code snippet).

slvrtrn on Jul 11, 2023

The problem is still presented. Update version from 0.0.14 to 0.1.0. Also, try to set connect_timeout: 5_000, request_timeout: 60_000

Maybe need to test on the latest CH version, because we test it on 21.7.4.18

GGo3 on Jun 22, 2023