rails: ActionCable's SubscriptionAdpater for Redis does not reconnect after a connection loss.

Steps to reproduce

  • Start rails server and let ActionCable establish a connection to Redis.
  • Restart/Stop the Redis instance.

Expected behavior

If the connection between ActionCable and Redis is interrupted or lost the server process tries to re-establish the connection, informs web socket connections about the disconnection (or re-establishes subscriptions) and does not abort.

Actual behavior

If the connection between ActionCable and Redis is interrupted or lost the server process is aborted, without trying to re-establish the connection.

RoomChannel is transmitting the subscription confirmation
RoomChannel is streaming from room:123
Exiting
2.3.3/gems/redis-3.3.2/lib/redis/client.rb:257:in `rescue in io': Connection lost (ECONNRESET) (Redis::ConnectionError)
  gems/redis-3.3.2/lib/redis/client.rb:250:in `io'
  gems/redis-3.3.2/lib/redis/client.rb:261:in `read'
  gems/redis-3.3.2/lib/redis/client.rb:136:in `block (3 levels) in call_loop'
  gems/redis-3.3.2/lib/redis/client.rb:135:in `loop'
  gems/redis-3.3.2/lib/redis/client.rb:135:in `block (2 levels) in call_loop'
  gems/redis-3.3.2/lib/redis/client.rb:231:in `block (2 levels) in process'
  gems/redis-3.3.2/lib/redis/client.rb:367:in `ensure_connected'
  gems/redis-3.3.2/lib/redis/client.rb:221:in `block in process'
  gems/redis-3.3.2/lib/redis/client.rb:306:in `logging'
  gems/redis-3.3.2/lib/redis/client.rb:220:in `process'
  gems/redis-3.3.2/lib/redis/client.rb:134:in `block in call_loop'
  gems/redis-3.3.2/lib/redis/client.rb:280:in `with_socket_timeout'
  gems/redis-3.3.2/lib/redis/client.rb:133:in `call_loop'
  gems/redis-3.3.2/lib/redis/subscribe.rb:43:in `subscription'
  gems/redis-3.3.2/lib/redis/subscribe.rb:12:in `subscribe'
  gems/redis-3.3.2/lib/redis.rb:2765:in `_subscription'
  gems/redis-3.3.2/lib/redis.rb:2143:in `block in subscribe'
  gems/redis-3.3.2/lib/redis.rb:58:in `block in synchronize'
  from .rubies/ruby-2.3.3/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'
  gems/redis-3.3.2/lib/redis.rb:58:in `synchronize'
  gems/redis-3.3.2/lib/redis.rb:2142:in `subscribe'
  gems/actioncable-5.0.1/lib/action_cable/subscription_adapter/redis.rb:75:in `block in listen'
  gems/redis-3.3.2/lib/redis/client.rb:293:in `with_reconnect'
  gems/redis-3.3.2/lib/redis.rb:64:in `block in with_reconnect'
  gems/redis-3.3.2/lib/redis.rb:58:in `block in synchronize'
  from .rubies/ruby-2.3.3/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'
  gems/redis-3.3.2/lib/redis.rb:58:in `synchronize'
  gems/redis-3.3.2/lib/redis.rb:63:in `with_reconnect'
  gems/redis-3.3.2/lib/redis.rb:70:in `without_reconnect'
  gems/actioncable-5.0.1/lib/action_cable/subscription_adapter/redis.rb:72:in `listen'
  gems/actioncable-5.0.1/lib/action_cable/subscription_adapter/redis.rb:146:in `block in ensure_listener_running'

Additional information

As soon as the redis instance is unavailable and redis-rb receives the EOF from reading the socket the Listener aborts as intended? here.

The redis-rb client offers the option to specify reconnect_attempts during initialisation, but in case of a disconnection, after a successful one, the client doesn’t attempt a reconnection. Even if it did, it might be too aggressive to allow a re-establishment since there is no delay.

A work-around is something like:

    def listen_with_retry(conn)
      listen conn
    rescue ::Redis::ConnectionError, ::Redis::CannotConnectError => e
      ActionCable.server.connections.each(&:close)
      sleep 1
      retry
    end

System configuration

Rails version: Rails 5.0.1

Ruby version: ruby 2.3.3p222

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 23
  • Comments: 22 (5 by maintainers)

Commits related to this issue

Most upvoted comments

The fix is coming in Rails 7.1. For older versions (5+), there is a backport gem: https://github.com/anycable/action-cable-redis-backport

This is such a big vulnerability for us, that we decided to monkey patch ActionCable::SubscriptionAdapter::Redis::Listener. It works by catching the disconnect exception and restart Puma using a hot restart. This recovers full functionality, provided that Redis is available. Since our primary use case is restoring connection in case of a Redis primary/replica failover, this is a safe assumption.

Perhaps this could be of use to anyone. Note: I’m not proud of this, but it works 😉

# Increase robustness against Redis disconnects, restarts or master failover.
# With the default functionality of ActionCable, the entire server crashes.
#
# This is a monkey patch of
# https://github.com/rails/rails/blob/723375147b4110ad7260962851ca4e3a7a951b47/actioncable/lib/action_cable/subscription_adapter/redis.rb#L79
module ActionCable::SubscriptionAdapter::Redis::ListenerExtensions
  def listen(conn)
    super
  rescue StandardError => e
    Sentry.capture_exception(e)

    # sleep a random amount of time to avoid all processes restarting at once.
    # During this period, websockets will be unavailable, but the Rails process
    # stays alive
    time = 60 * rand + 5
    puts "Critical ActionCable error. Restarting in #{time}"
    sleep time
    # Restart Puma using 'hot restart'
    # https://github.com/puma/puma/blob/master/docs/restart.md#hot-restart
    Process.kill('USR2', Process.pid)
  end
end

class ActionCable::SubscriptionAdapter::Redis::Listener
  prepend ActionCable::SubscriptionAdapter::Redis::ListenerExtensions
end

It doesn’t happen often, but when it does happen, it’s catastrophic. A Redis::ConnectionError when sending a message in Action Cable to an end user will cause an entire worker (on heroku) to go down, and the worker won’t restart itself. Does anyone recommend the patch suggested by @rhomeister for this issue, or some other strategy?

We have an API for this in ActiveJob that would be good to consider extracting into something generic for timeouts/retries. See https://github.com/rails/rails/blob/master/activejob/lib/active_job/exceptions.rb#L116

On Mon, Jan 29, 2018 at 2:24 AM, Philipp Weissensteiner < notifications@github.com> wrote:

How do you feel about a (exponential) back off mechanism where you’d have an option to specify the number of reconnect_attempts and the max_reconnection_timeout?

I.e the client would attempt 10 reconnections, exponentially increasing the (sleep) duration between each attempt until it reaches the specified connection limit.

If that sounds ok, I’d be happy to start work on a PR.

Cheers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rails/rails/issues/27659#issuecomment-361201566, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAKtV46H-cpEg1IP-vGLqnxvY92G-JTks5tPZxhgaJpZM4LiDqN .