zulip: Messages get stuck or disappear
I’m running the zulip/docker-zulip:latest
image under Kubernetes, using a helm chart based on the armooo/zulip-helm
chart. A while after starting the Zulip server pod, messages stop being delivered. Sometimes the service is up and working for a few days, sometimes just hours before failure.
The sending client doesn’t get a server reply with a message timestamp, and server.log does not log any /json/message
entries. Deleting the Zulip server pod, thereby restarting Zulip, but none of the other services, causes (usually) the undelivered messages to get delivered.
- zulip 1.9.0-latest
- memchached: 2.3.1
- redis: 4.2.0
- postgresql: 0.19.0
- rabbitmq: 3.5.0
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 26 (13 by maintainers)
We’re running into this issue as well. We’ve tracked it down to the message_sender queue not actually doing anything. When we shell into the Zulip pod we can flush all queued messages by starting a new one (
$ python3 /home/zulip/deployments/current/manage.py process_queue --queue_name=message_sender --worker_num=4
), but when that exits the issue returns.Killing the existing message_sender processes (
$ pkill -f message_sender
) returns the functionality of the service (supervisor restarts them). We’re going to try abusing the liveness probe to kill these on a cadence until the cause can be determined. It would be beneficial to have discrete images of the different components currently held in the zulip container, allowing for a distributed deployment.