ioredis: Intermittent "Connection is closed" errors

We are currently working on a Lambda function, which connects to a Redis 3.2.10 cluster on AWS Elasticache.

This Lambda function will connect to the Redis cluster, run KEYS on each master node, collect the responses from each node and return an array of keys. We then publish an SNS message for each key in this array, then close the cluster connection, before the Lambda ends.

AWS Lambda freezes and thaws the container in which programs run. So, ideally we would create a connection once then re-use it on every invocation. However, we have found that for the Lambda to end, we must explicitly end the client connection to the cluster as Lambda waits for the Node event loop to empty before the Lambda ends. This is why we create the connection at the start of the function (representing a Lambda invocation) run our queries and then when this completes we attempt to gracefully .quit() the Redis.Cluster connection.

I can’t share the actual code that we’re working on, but I’ve been able to extract the logic and create a simple example of the issue we’re facing:

test.js

const Redis = require("ioredis");

let interval = setInterval(() => {
  let conn = new Redis.Cluster(["cluster.id.clustercfg.euw1.cache.amazonaws.com"]);

  Promise
    .all(conn.nodes("master").map((node) => {
      return node.keys("*:*");
    }))

    .then((resp) => {
      console.log("Complete KEYS on all nodes", JSON.stringify(resp));
      return conn.quit()
    })

    .then(() => {
      console.log("Gracefully closed connection");
    })

    .catch((e) => {
      console.log("Caught rejection: ", e.message);
    })
}, 500);

setTimeout(() => {
  clearInterval(interval);
}, 3000);

Example output:

  ioredis:cluster status: [empty] -> connecting +0ms
  ioredis:redis status[cluster.id.clustercfg.euw1.cache.amazonaws.com:6379]: [empty] -> wait +5ms
  ioredis:cluster getting slot cache from cluster.id.clustercfg.euw1.cache.amazonaws.com:6379 +1ms
  ioredis:redis status[cluster.id.clustercfg.euw1.cache.amazonaws.com:6379]: wait -> connecting +2ms
  ioredis:redis queue command[0] -> cluster(slots) +1ms
  ioredis:redis queue command[0] -> keys(*:*) +1ms
  ioredis:redis status[10.1.0.45:6379]: connecting -> connect +21ms
  ioredis:redis write command[0] -> info() +0ms
  ioredis:redis status[10.1.0.45:6379]: connect -> ready +5ms
  ioredis:connection send 2 commands in offline queue +1ms
  ioredis:redis write command[0] -> cluster(slots) +0ms
  ioredis:redis write command[0] -> keys(*:*) +0ms
  ioredis:redis status[10.1.1.131:6379]: [empty] -> wait +3ms
  ioredis:redis status[10.1.2.152:6379]: [empty] -> wait +1ms
  ioredis:redis status[10.1.0.45:6379]: [empty] -> wait +0ms
  ioredis:cluster status: connecting -> connect +0ms
  ioredis:redis queue command[0] -> cluster(info) +1ms
Complete KEYS on all nodes [["132f28d0-8322-43d6-bbbd-200a19c130c0:tf0NuoVBZIXDIryxBRj3lrcayXeHwaoD"]]
  ioredis:cluster status: connect -> disconnecting +2ms
  ioredis:redis queue command[0] -> quit() +0ms
  ioredis:redis status[10.1.1.131:6379]: wait -> connecting +0ms
  ioredis:redis status[10.1.2.152:6379]: wait -> connecting +0ms
  ioredis:redis status[10.1.0.45:6379]: wait -> connecting +0ms
  ioredis:redis status[10.1.1.131:6379]: connecting -> end +2ms
  ioredis:redis status[10.1.2.152:6379]: connecting -> end +0ms
  ioredis:redis status[10.1.0.45:6379]: connecting -> end +0ms
  ioredis:redis status[10.1.0.45:6379]: ready -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +1ms
  ioredis:redis status[10.1.0.45:6379]: close -> end +0ms
  ioredis:cluster status: disconnecting -> close +2ms
  ioredis:cluster status: close -> end +0ms
Caught rejection:  Connection is closed.
  ioredis:delayqueue send 1 commands in failover queue +100ms
  ioredis:cluster status: end -> disconnecting +2ms
// SNIP

Why would we be getting the Connection is closed rejection error? This feels like a bug, as I think we are going about this in the correct way, but I’m happy to be proved wrong!

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 10
  • Comments: 31

Most upvoted comments

I’m commenting here to confirm that this issue is still cropping up for us.

I’m not really sure why the bot above adds a “wontfix” label to an issue that hasn’t had any recent activity 🤔

I’ve also been able to reproduce this problem but only in AWS.

I believe the problem is related to the offline queue. The error originates when the close() method is called from the event_handler. The error eventually bubbles up in the redis class when flushQueue() is executed with a non-empty offline queue.

The commandQueue also occasionally causes this problem but it’s much less frequent.

My case is using ioredis while doing integration tests. Every test opens and closes a connection in its run. I’m doing .quit() at the end of every test and it successfully resolves, but I still get the error for some reason.

The solution for me was what @elliotttf suggested: switching enableOfflineQueue to false.