puma: Slow restarts

We love puma and have been using it for years… baffled by this one though! What are we missing?

Steps to reproduce

  1. puma configs
# web
tag "#{ENV['RAILS_ENV']}-app-puma"
environment ENV['RAILS_ENV']

pidfile "#{ENV['SHARED_PATH']}/puma.pid"
stdout_redirect "#{ENV['SHARED_PATH']}/log/#{ENV['RAILS_ENV']}_app_puma.stdout.log",
  "#{ENV['SHARED_PATH']}/log/#{ENV['RAILS_ENV']}_app_puma.stderr.log",
  false # don't append

bind "unix://#{ENV['SHARED_PATH']}/#{ENV['RAILS_ENV']}-app-puma.sock"
worker_timeout 30

threads 0,4
workers 1

preload_app!
on_worker_boot do
  ActiveSupport.on_load(:active_record) do
    ActiveRecord::Base.establish_connection
  end
  Que.mode = :off
end
# action cable
tag "#{ENV['RAILS_ENV']}-app-puma-actioncable"
environment ENV['RAILS_ENV']

pidfile "#{ENV['SHARED_PATH']}/puma-actioncable.pid"
stdout_redirect "#{ENV['SHARED_PATH']}/log/#{ENV['RAILS_ENV']}_app_puma_actioncable.stdout.log",
  "#{ENV['SHARED_PATH']}/log/#{ENV['RAILS_ENV']}_app_puma_actioncable.stderr.log",
  false # don't append

bind "unix://#{ENV['SHARED_PATH']}/#{ENV['RAILS_ENV']}-app-puma-actioncable.sock"
worker_timeout 30

threads 1,4
workers 1
  1. Action Cable Configs

All defaults except for:

  config.action_cable.allowed_request_origins = ['https://www.app.com']
  config.action_cable.logger = ActiveSupport::Logger.new(
    File.join(Rails.root, 'log', "#{Rails.env}-cable.log")
  )

  1. Systemd
# web
[Unit]
Description=Rails Web Server
PartOf=staging-app.target

[Service]
EnvironmentFile=/etc/default/staging-app

User=staging-app
WorkingDirectory=/home/staging-app/app/current
Type=simple
Restart=always
KillMode=process

ExecStart=/bin/bash -lc 'bundle exec --keep-file-descriptors puma -C config/deploy/staging/puma.rb'
# cable
[Unit]
Description=Rails Cable Server
PartOf=staging-app.target

[Service]
EnvironmentFile=/etc/default/staging-app

User=staging-app
WorkingDirectory=/home/staging-app/app/current
Type=simple
Restart=always
KillMode=process

ExecStart=/bin/bash -lc 'bundle exec --keep-file-descriptors puma -C config/deploy/staging/puma-action-cable.rb'
  1. Restart
time sudo systemctl restart staging-app-puma.service

real	1m30.106s
user	0m0.011s
sys	0m0.000s

Expected behavior

Normally, we get instant restarts with puma.

Actual behavior

Restarts take 1m30s usually.

I have never seen it take this long, but we weren’t the original developer of the application.

This application uses ActionCable running in a separate process with nginx sending all /cable requests to another process (also on puma). Both processes take the 1m30s to restart, the ActionCable config file is the same minus preload_app! and the on_worker_boot block.

The ActiveJob process restarts instantly.

Questions

  1. Is there a way to tell what is happening in puma after the restart is initiated?
  2. Is it expected Action Cable processes take a long time to stop restart? But with no Action Cable requests going to the puma web process, why is that slow to restart as well?
  3. Is this a known problem with 5.1? The client will be updating eventually.
  4. What are we missing?

System configuration

Ruby version: 2.6.0 Rails version: 5.1.6.1 Puma version: 3.12.0

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (7 by maintainers)

Commits related to this issue

Most upvoted comments

We noticed similar issues on a few of our servers (rails >=5.2, ruby >= 2.6.0).

Seems to be fixed with Puma 4.0.x 😃

Restart times went up to (exactly) 90 seconds for each app that i updated from ruby 2.5 to 2.6. The issue persists when updating to 2.6.1.

I have the same issue with ruby 2.6.2 90 seconds is a default timeout for systemd restart command, after that process is just killed by the systemd and started again

This is almost certainly the ruby 2.6.0 wait bug. Why are you doing workers 1? Might as well avoid the overhead of using the clustering in that case and just comment it out and use the single process model.

I just switched to the alternative forked systemd in order to utilize phased-restarts with pumactl https://github.com/puma/puma/blob/master/docs/systemd.md#alternative-forking-configuration It was a huge pain, and somewhat fragile compared to the systemd simple/socket I’m going to do a quick sanity test with phased restarts and see how it handles.

hot restarts were dragggggging causing racktimeouts and gateway timeouts from the loadbalancer with two workers 16 threads, preload_app. My guess is with preload, and perhaps bad actors preventing connections from being drained, workers cant reboot before timeout threshholds are hit. In fact i wouldn’t be surprised if my timeouts were the only thing allowing the restart to complete

I’ll post all me new/old configs in a bit, it’s a painstaking process to debug this in production env.