kubernetes: GKE: default keep-alive on nodes too long for GCE to handle
Looking at the default unix keepalive settings on our GKE nodes:
$ gcloud compute ssh gke-xxx -- sudo /sbin/sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_probes net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
According to the Google Cloud troubleshooting documentation these settings are too high, and GCE only supports a maximum connection time of 10 minutes. The documentation advices these settings to be changed to less than 600 seconds.
We can change this manually on the machine, but with automated scaling, this becomes problematic. Should the cloud-config scripts be tweaked to set these keys to acceptable values by default?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 9
- Comments: 22 (9 by maintainers)
We’ll fix these in the base OS images we use.
@bobbypage - 300 + (60 x 5) == 600; since the Google HTTPS Load Balancer TCP timeout is exactly
600s, shouldn’t the defaults be HIGHER than this? They specifically recommend620s.https://cloud.google.com/load-balancing/docs/https/#timeouts_and_retries
(1) HTTP keepalive timeouts FYI (not related to this issue)
An HTTP keepalive timeout, whose value is fixed at 10 minutes (600 seconds). This value is not configurable by modifying your backend service. You must configure the web server software used by your backends so that its keepalive timeout is longer than 600 seconds to prevent connections from being closed prematurely by the backend. This timeout does not apply to WebSockets. This table illustrates changes necessary to modify keepalive timeouts for common web server software:
https://cloud.google.com/load-balancing/docs/https/https-logging-monitoring
(2) OS-network layer TCP keepalive timeouts - exactly relevant to this issue
This change has already been released to GKE on node image versions
v1.18.6-gke.6300and higher.We changed the sysctls to the following by default: