docker-zulip: Error: psycopg2.OperationalError: server closed the connection unexpectedly
On the latest 2.1.3 (but also on 2.1.1, and 2.1.2) I am frequently (multiple times per day, 10-20?) getting this error emailed to me:
---------- Forwarded message ---------
From: myemail
Date: Fri, Apr 3, 2020 at 9:04 PM
Subject: [Django] a111cc8a0340: server closed the connection unexpectedly\n This probably means the server terminated abnormally\n before or while processing the request.\n
To: myemail
Logger root, from module zerver.worker.queue_processors line 151:
Error generated by Anonymous user (not logged in) on a111cc8a0340 deployment
Traceback (most recent call last):
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/home/zulip/deployments/2020-04-01-21-55-56/zerver/lib/db.py", line 31, in execute
return wrapper_execute(self, super().execute, query, vars)
File "/home/zulip/deployments/2020-04-01-21-55-56/zerver/lib/db.py", line 18, in wrapper_execute
return action(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/zulip/deployments/2020-04-01-21-55-56/zerver/worker/queue_processors.py", line 134, in consume_wrapper
self.consume(data)
File "/home/zulip/deployments/2020-04-01-21-55-56/zerver/worker/queue_processors.py", line 310, in consume
user_profile = get_user_profile_by_id(event["user_profile_id"])
File "/home/zulip/deployments/2020-04-01-21-55-56/zerver/lib/cache.py", line 186, in func_with_caching
val = func(*args, **kwargs)
File "/home/zulip/deployments/2020-04-01-21-55-56/zerver/models.py", line 2072, in get_user_profile_by_id
return UserProfile.objects.select_related().get(id=uid)
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 374, in get
num = len(clone)
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 232, in __len__
self._fetch_all()
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 1121, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/query.py", line 53, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
raise original_exception
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 889, in execute_sql
cursor.execute(sql, params)
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/home/zulip/deployments/2020-04-01-21-55-56/zulip-py3-venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/home/zulip/deployments/2020-04-01-21-55-56/zerver/lib/db.py", line 31, in execute
return wrapper_execute(self, super().execute, query, vars)
File "/home/zulip/deployments/2020-04-01-21-55-56/zerver/lib/db.py", line 18, in wrapper_execute
return action(sql, params)
django.db.utils.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Deployed code:
- ZULIP_VERSION: 2.1.3
- version: docker
Request info: none
The installed version appears to work correctly though.
Elsewhere in other issues I have posted here, I have received the advice that this is caused by restarted services, but this is not something that I do as the Docker Swarm stack is not restarted and I see the services having an uptime of larger than one day (while I have received these error messages)
I am running in a single node Docker Swarm with a docker-compose identical to the one described in docker-zulip repo.
For validation, here is it, with the secrets redacted:
version: '3.3'
services:
database:
image: zulip/zulip-postgresql:10
environment:
POSTGRES_DB: zulip
POSTGRES_PASSWORD: xxx
POSTGRES_USER: zulip
volumes:
- zulip_psql_data:/var/lib/postgresql/data
networks:
- default
logging:
driver: json-file
memcached:
image: memcached:alpine
networks:
- default
logging:
driver: json-file
command:
- 'sh'
- '-euc'
- |
echo 'mech_list: plain' > "$$SASL_CONF_PATH"
echo "zulip@$$HOSTNAME:$$MEMCACHED_PASSWORD" > "$$MEMCACHED_SASL_PWDB"
exec memcached -S
environment:
SASL_CONF_PATH: '/home/memcache/memcached.conf'
MEMCACHED_SASL_PWDB: '/home/memcache/memcached-sasl-db'
MEMCACHED_PASSWORD: 'xxx'
rabbitmq:
image: rabbitmq:3.7.7
environment:
RABBITMQ_DEFAULT_PASS: xxx
RABBITMQ_DEFAULT_USER: zulip
volumes:
- zulip_rabbitmq_data:/var/lib/rabbitmq
networks:
- default
logging:
driver: json-file
redis:
image: redis:alpine
volumes:
- zulip_redis_data:/data:rw
networks:
- default
logging:
driver: json-file
command:
- 'sh'
- '-euc'
- |
echo "requirepass '$$REDIS_PASSWORD'" > /etc/redis.conf
exec redis-server /etc/redis.conf
environment:
REDIS_PASSWORD: 'xxx'
zulip:
image: zulip/docker-zulip:2.1.3-0
ports:
- 80
environment:
DB_HOST: database
DB_HOST_PORT: '5432'
DB_USER: zulip
DISABLE_HTTPS: 'True'
SECRETS_email_password: xxx
SECRETS_google_oauth2_client_secret: xxx
SECRETS_postgres_password: xxx
SECRETS_rabbitmq_password: xxx
SECRETS_memcached_password: 'xxx'
SECRETS_redis_password: 'xxx'
SECRETS_secret_key: xxx
SECRETS_social_auth_github_secret: xxx
SETTING_EMAIL_HOST: smtp.gmail.com
SETTING_EMAIL_HOST_USER: xxx
SETTING_EMAIL_PORT: '587'
SETTING_EMAIL_USE_SSL: 'False'
SETTING_EMAIL_USE_TLS: 'True'
SETTING_EXTERNAL_HOST: xxx.xxx.xxx
SETTING_GOOGLE_OAUTH2_CLIENT_ID: xxxxx
SETTING_MEMCACHED_LOCATION: memcached:11211
SETTING_PUSH_NOTIFICATION_BOUNCER_URL: https://push.zulipchat.com
SETTING_RABBITMQ_HOST: rabbitmq
SETTING_REDIS_HOST: redis
SETTING_SOCIAL_AUTH_GITHUB_KEY: xxx
SETTING_ZULIP_ADMINISTRATOR: xxx
SSL_CERTIFICATE_GENERATION: self-signed
ZULIP_AUTH_BACKENDS: EmailAuthBackend,GoogleMobileOauth2Backend,GitHubAuthBackend
volumes:
- zulip_app_data:/data
networks:
- traefik-public
- default
logging:
driver: json-file
deploy:
labels:
traefik.docker.network: traefik-public
traefik.enable: 'true'
traefik.http.routers.zulip.entrypoints: websecure
traefik.http.routers.zulip.rule: Host(`xxx.xxx.xxx`)
traefik.http.routers.zulip.tls.certresolver: letsencryptresolver
traefik.http.services.zulip.loadbalancer.server.port: '80'
networks:
default:
driver: overlay
traefik-public:
external: true
volumes:
zulip_app_data:
external: true
zulip_psql_data:
external: true
zulip_rabbitmq_data:
external: true
zulip_redis_data:
external: true
Some hopefully useful info:
docker version
Client: Docker Engine - Community
Version: 19.03.8
API version: 1.40
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:25:46 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:24:19 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
uname -a
Linux xxx.com 4.15.0-91-generic zulip/zulip#92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
This is a “sister” issue as zulip/zulip#14456 that I also opened (but with another error message).
Anything else I can provide the help solve this…?
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 16 (9 by maintainers)
It looks like the remaining problem was a Docker Swarm default configuration problem; a potential solution is suggested here:
https://github.com/vapor/postgres-kit/issues/164#issuecomment-738450518
I’ll transfer this to the docker-zulip repository.
OK, my theory is that the Docker Swarm networking stack is killing the open TCP connections between the Zulip server and the postgres/memcached servers. We had a much more fatal similar with RabbitMQ fixed last year (b312001fd92dc36233e5a9f57cd9fada890880c4). The symptom is the same as the service being restarted – the connections are killed, which Zulip will re-establish in each process when it discovers this (and send an error email), resulting in this random distribution of error emails.
Googling suggests that other products have indeed had that sort of problem with Docker Swarm’s aggressive killing of TCP connections. https://success.docker.com/article/ipvs-connection-timeout-issue seems to be their knowledge base article on the topic.
@stratosgear can you try playing with the diagnostic steps described on that article to see if they suggest this is what’s happening? Based on that doc, it looks like Docker Swarm itself doesn’t support configuring its networking behavior of killing idle TCP connections 😦.
For memcached, https://github.com/lericson/pylibmc/issues/199, https://sendapatch.se/projects/pylibmc/behaviors.html, and https://pypi.org/project/pylibmc/1.3.0/ suggest they have an undocumented option to set the keepalive settings.