speckle-server: Server container hanging [500]
Prerequisites
- I read the contribution guidelines
- I checked the documentation and found no answer.
- I checked existing issues and found no similar issue.
- I checked the community forum for related discussions and found no answer.
- I’m reporting the issue to the correct repository (see also speckle-server, speckle-sharp, specklepy, speckle-docs, and others)
What package are you referring to?
Server (backend)
Describe the bug
Our Speckle Server is unexpectedly going in an unresponsive state.
Restarting the speckle-server Docker container will solve the issue but we can’t understand what is causing it.
We have investigated the logs and only found some error messages from /graphql API requests (part of the log file attached below).
Following the unexpected crash all requests to the Speckle server return: Error: failed with status code 500.
Expected behavior
Server running smoothly and all requests handled correctly
System Info
Google Cloud VM Instance:
- OS: Debian GNU/Linux 11 (bullseye)
- OS version: 11
- OS Config agent version: 20230330.00-g1
- CPU platform: Intel Skylake
- Architecture: x86/64
- Boot Disk size: 10 GB
- External Disk size (container mount point): 256 GB
- Memory: 4 GB
Failure Logs
[36mspeckle-server_1 |[0m {“level”:“error”,“time”:“2023-06-23T07:20:12.697Z”,“req”:{“id”:“8f50e384-010e-4da7-a79f-24cc87b068c2”,“method”:“POST”,“path”:“/api/getobjects/19e2cbffeb”,“headers”:{“host”:“geo.stykka.com”,“connection”:“upgrade”,“content-length”:“11078”,“accept”:“text/plain”,“sec-fetch-site”:“cross-site”,“accept-language”:“da-DK,da;q=0.9”,“accept-encoding”:“gzip, deflate, br”,“sec-fetch-mode”:“cors”,“content-type”:“application/json”,“origin”:“https://dev-velux.web.app”,“user-agent”:“Mozilla/5.0 (iPhone; CPU iPhone OS 16_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5.1 Mobile/15E148 Safari/604.1”,“referer”:“https://dev-velux.web.app/",“sec-fetch-dest”:“empty”,“x-request-id”:“8f50e384-010e-4da7-a79f-24cc87b068c2”,“x-speckle-client-ip”:“91.143.119.255”}},“authContext”:{“auth”:true,“userId”:“3727070022”,“role”:“server:admin”,“scopes”:[“streams:read”,“streams:write”,“profile:read”,“profile:email”,“users:read”,“server:stats”]},“userId”:“3727070022”,“streamId”:“19e2cbffeb”,“err”:{“type”:“Error”,“message”:"Premature close”,“stack”:“Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close\n at new NodeError (node:internal/errors:399:5)\n at Gzip.<anonymous> (node:internal/streams/pipeline:352:14)\n at Gzip.emit (node:events:525:35)\n at Gzip.emit (node:domain:489:12)\n at emitCloseNT (node:internal/streams/destroy:132:10)\n at emitErrorCloseNT (node:internal/streams/destroy:117:3)\n at process.processTicksAndRejections (node:internal/process/task_queues:82:21)”,“code”:“ERR_STREAM_PREMATURE_CLOSE”},“msg”:“App error streaming objects”}
[36mspeckle-server_1 |[0m {“level”:“error”,“time”:“2023-06-23T07:20:14.050Z”,“req”:{“id”:“cd65b754-33de-443d-80f5-0f35e880a240”,“method”:“POST”,“path”:“/api/getobjects/19e2cbffeb”,“headers”:{“host”:“geo.stykka.com”,“connection”:“upgrade”,“content-length”:“33167”,“accept”:“text/plain”,“sec-fetch-site”:“cross-site”,“accept-language”:“da-DK,da;q=0.9”,“accept-encoding”:“gzip, deflate, br”,“sec-fetch-mode”:“cors”,“content-type”:“application/json”,“origin”:“https://dev-velux.web.app”,“user-agent”:“Mozilla/5.0 (iPhone; CPU iPhone OS 16_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5.1 Mobile/15E148 Safari/604.1”,“referer”:“https://dev-velux.web.app/",“sec-fetch-dest”:“empty”,“x-request-id”:“cd65b754-33de-443d-80f5-0f35e880a240”,“x-speckle-client-ip”:“91.143.119.255”}},“authContext”:{“auth”:true,“userId”:“3727070022”,“role”:“server:admin”,“scopes”:[“streams:read”,“streams:write”,“profile:read”,“profile:email”,“users:read”,“server:stats”]},“userId”:“3727070022”,“streamId”:“19e2cbffeb”,“err”:{“type”:“Error”,“message”:"Premature close”,“stack”:“Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close\n at new NodeError (node:internal/errors:399:5)\n at onclose (node:internal/streams/end-of-stream:154:30)\n at process.processTicksAndRejections (node:internal/process/task_queues:77:11)”,“code”:“ERR_STREAM_PREMATURE_CLOSE”},“msg”:“App error streaming objects”}
[36mspeckle-server_1 |[0m {“level”:“error”,“time”:“2023-06-23T07:23:54.255Z”,“req”:{“id”:“be9f11cf-02bf-47eb-996b-75e5b5d8cf8b”,“method”:“POST”,“path”:“/graphql”,“headers”:{“host”:“geo.stykka.com”,“connection”:“upgrade”,“content-length”:“468”,“user-agent”:“Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0”,“accept”:“/”,“accept-language”:“en-US,en;q=0.5”,“accept-encoding”:“gzip, deflate, br”,“referer”:“https://app.stykka.com/",“content-type”:“application/json”,“origin”:“https://app.stykka.com”,“sec-fetch-dest”:“empty”,“sec-fetch-mode”:“cors”,“sec-fetch-site”:“same-site”,“x-request-id”:“be9f11cf-02bf-47eb-996b-75e5b5d8cf8b”,“x-speckle-client-ip”:“194.182.0.255”}},“res”:{“statusCode”:500,“headers”:{“x-request-id”:“be9f11cf-02bf-47eb-996b-75e5b5d8cf8b”,“content-type”:"application/json; charset=utf-8”,“content-length”:“118”,“etag”:“W/"76-xSxs8YfXn3PEb+TNZywNRe/DrDo"”}},“err”:{“type”:“Error”,“message”:“failed with status code 500”,“stack”:“Error: failed with status code 500\n at ServerResponse.onResFinished (/speckle-server/node_modules/pino-http/logger.js:113:40)\n at ServerResponse.emit (node:events:525:35)\n at ServerResponse.emit (node:domain:489:12)\n at onFinish (node:_http_outgoing:989:10)\n at callback (node:internal/streams/writable:555:21)\n at afterWrite (node:internal/streams/writable:500:5)\n at afterWriteTick (node:internal/streams/writable:487:10)\n at process.processTicksAndRejections (node:internal/process/task_queues:81:21)”},“responseTime”:60001,“msg”:“request errored”}
Last 500 lines of the speckle-server container logs
Affected Projects
Speckle Frontend UI and Javascript viewer are not loading, .NET requests are failing.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 28 (13 by maintainers)
@Sticcia - yes, though I think the command can just be
--inspect=9229 bin/wwwas the entrypoint is alreadynode. Most importantly, I would not expose the port to the internet due to the security implications of this. Instead I would suggest creating an ssh tunnel to the host machine, or something similar which would allow you to access the port securely and without exposing it publicly.@iainsproat
I was mistaken, the server crashed twice this weekend. Once at 10:45 where our uptime check got stuck in a restart loop (this is clearly visible in the graphs too, there are gaps during the restarts).
And once at 14:20, where the health check was disabled and the restart was done manually at 17:00.
Sorry for the confusion. Here is the data you asked for the crash at 14:20 (with no action taken in this time slot).
Data from 14:18 - 14:35
Speckle Server container logs: logs.txt
Grafana data export: grafana_data_export.csv
The increase of connections is before the restart.
Can you connect to your speckle server and query
your-domain/metricsorlocalhost:3000/metrics? I’m not sure how you’ve configured ingress, if you’re using Speckle’s DigitalOcean 1Click deployment the ingress will prevent external access to/metrics. If you have deployed Grafana to a separate machine without direct network access to the speckle server, you would want to have an ingress which adds some sort of authentication to prevent public access to/metrics. It is probably easiest to run it on the same VM, and deploy it via docker compose on the same docker network as speckle server.Thanks for providing this.
After reading a little more about Docker Compose’s behaviour, it seems that it will not restart unhealthy containers 😞 . (For security reasons I would avoid the ‘autoheal’ solution mentioned in that link, as it requires mounting the docker socket to that container, effectively giving it access to your host machine.)
From your error message we’re again seeing the same
Knex: Timeout acquiring a connection. The pool is probably full.. This might be for a number of reasons (as I mentioned above), for example:If you are able to connect Prometheus or Grafana to the metrics endpoint (
/metrics) of the speckle server. There is a metric,speckle_server_knex_free, which provides data on the number of available connections. You can plot this over time to see if connections are become saturated. If connections remain free then it may point to an issue with the network or the postgres server itself.As a possible solution to saturated connections, you could attempt to increase the number of available connections by adding the
POSTGRES_MAX_CONNECTIONS_SERVERenvironment variable to the server. The default for this environment variable is4, so you could try increasing it to8. Please note that there may be limitations on the number of connections that your postgres server can handle, and the larger number of connections may require more cpu & memory resources for both the database and speckle server.You could also provide redundancy by providing a one or more replicas of the speckle server container. If one container was to fail, or its database connections become temporarily saturated, then the other could continue to operate and prevent downtime. Please refer to Docker’s documentation on replicas.