MeshCentral: Terminal and File Transfer Timeouts when using NGINX as a reverse proxy

Problem Description

As per https://github.com/Ylianst/MeshCentral/issues/607 we can see that another user had a similar issue with terminal timeouts. We observe that in our case, even when we set proxy_read_timeout in NGINX to 330s (or higher), the terminal still gets timed out.

Unfortunately the pastebin files have since been removed from the previous bug, we are unsure if there is something else in our configuration that could be tuned.

Environment

Agent OS: Ubuntu 16.04.4 LTS MeshCentral Version: v0.5.1-j NGINX Configuration

server {
 listen      192.168.1.3:80;
 server_name meshcentral.company.com;
 return      301 https://$server_name$request_uri;
 server_tokens off;
  }
server {
  listen       192.168.1.3:443 ssl;
  server_name  meshcentral.company.com;
  server_tokens off;
  access_log  /var/log/nginx/meshcentral_access.log;
  error_log  /var/log/nginx/meshcentral_error.log;
  ssl_certificate       /etc/nginx/certs/wildcard_company.crt;
  ssl_certificate_key   /etc/nginx/certs/wildcard_company.key;
  proxy_send_timeout 770s;
  proxy_read_timeout 660s;
  ssl_session_cache shared:WEBSSL:10m;
  ssl_ciphers HIGH:!aNULL:!MD5;
  ssl_prefer_server_ciphers on;
  location /{
  proxy_pass https://192.168.1.4;
  proxy_http_version 1.1;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection "upgrade";
  proxy_set_header X-Forwarded-Host $host:$server_port;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;
  }
}

Terminal Testing

Local Lab (Without NGINX)

In my local lab, without NGINX, we can see that connections can stay alive for more than 24 hours 10:55:56 AM - admin → Ended terminal session “nra4bkn9koq” from 192.168.1.106 to 192.168.1.179, 88776 second(s)

Development Server (With NGINX)

When we increased the proxy_read_timeout in NGINX to 330s we saw this: 12:02:47 PM - root → Ended terminal session “guwtf2g4m69” from 192.168.1.1 to 192.168.1.2, 331 second(s)

When we increased the proxy_read_timeout in NGINX to 660s we saw this: 12:19:26 PM - root → Ended terminal session “xmjo04q1ws” from 192.168.1.1 to 192.168.1.2, 659 second(s)

File Upload Testing on Development Server with NGINX

Interestingly (not sure if this is related), we also observe that some large file transfers also fail.

Attempt 1, proxy_read_timeout in NGINX to 300s: File uploaded terminated prematurely after 2574 seconds (42.9 minutes, ~700MB): 12:26:43 PM - v051j → Ended file management session “ork2lwwd8is” from 192.168.1.1 to 192.168.1.2, 2574 second(s) 11:48:28 AM - v051j → Ended file management session “mws3gfeb939” from 192.168.1.1 to 192.168.1.2, 307 second(s) 11:45:02 AM - v051j → Upload: “/root/1GB.zip”

Attempt 2, proxy_read_timeout in NGINX to 300s: Successful; 1,073,741,824 bytes uploaded after 3900 seconds (65 minutes): 1:49:10 PM - v051j → Ended file management session “zpel76azu4” from 192.168.1.1 to 192.168.1.2, 3900 second(s) 12:46:32 PM - v051j → Ended file management session “asard4ty5a7” from 192.168.1.1 to 192.168.1.2, 303 second(s) 12:45:12 PM - v051j → Upload: “/root/1GB.zip”

Attempt 3, proxy_read_timeout in NGINX to 660s: Failed after 843 seconds 12:53:26 PM - root → Ended file management session “z6sndpgpd2” from 192.168.1.1 to 192.168.1.2, 843 second(s) 12:42:26 PM - root → Ended file management session “xebu3qqsocf” from 192.168.1.1 to 192.168.1.2, 165 second(s)

Attempt 4, proxy_read_timeout in NGINX to 660s: Failed after 659 seconds 1:34:06 PM - root → Ended file management session “ptp84t7ei4” from 192.168.1.1 to 192.168.1.2, 659 second(s) 1:33:57 PM - root → Ended file management session “ypnp7sos7js” from 192.168.1.1 to 192.168.1.2, 665 second(s)

Ask

Other than setting the proxy_read_timeout to something astronomically high (which, as per the other bug “it will cause NGINX to not free resources and be unstable after a long time”) is there anything we can do or try to keep terminal and file connections alive?

In our case, we are more concerned with the file uploads, we can probably get away with something like screen to circumvent the terminal timeouts.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (14 by maintainers)

Most upvoted comments

I just published what is hopefully a fix for this (however, not an ideal one). Update the server to MeshCentral v0.5.1-z and add the following line in the settings section of config.json:

"AgentPong": 60

This will cause the server to send dummy data to the agents each 60 seconds. Feel free to adjust the time in seconds as needed, the higher the better of course to lower amount of network traffic.

Let us know if this works.

OK, so this solution does seem to resolve part of the issue. I’m still testing to try and figure out more.

Often times the file transfer does not start, this is characterized by the following:

  1. Event logs not showing the name of the file being uploaded IE, this line is not created/displayed 2:54:36 PM - root → Upload: "/root/1GB.zip"
  2. No file is created
  3. The upload progress indicator does not move from 0%

It seems (need to continue testing) that if the file transfer does start, the file transfer will complete, which was not happening before, so the AgentPong setting seems to have helped.

Thank you again for your efforts!!

Unless the grammar of the nginx documentation is poor, it looks like ‘proxy_send_timeout’ and ‘proxy_send_timeout’ only operate on the http request/response, not the websocket connection. If that is the case, it makes sense that file transfer, desktop session, and terminal session all end prematurely. Looking on github, it look like I might be right… There is an outstanding issue on github for nginx timeouts not working at all for websockets. It looks like nginx doesn’t reset the timeout correctly when receiving websocket control frames, only data frames.

Just to clarify, that value is the idle timeout value, so if you specify 10, that’s 10 seconds of idle time on the connection, not necessarily 10 seconds between pings… Also, even tho this flag mention control channel, the agent will use the same value for tunnel connections.