rueidis: Panic during reconnect to mulfunctional Redis Cluster

Got the panic like this after reconnect to a Redis Cluster:

panic: protocol bug, message handled out of order

goroutine 1148 [running]:
github.com/rueian/rueidis.(*pipe)._backgroundRead(0xc0014ae210)
  /Users/fz/go/pkg/mod/github.com/rueian/rueidis@v0.0.57/pipe.go:301 +0x725
github.com/rueian/rueidis.(*pipe)._background(0xc0014ae210)
  /Users/fz/go/pkg/mod/github.com/rueian/rueidis@v0.0.57/pipe.go:135 +0x185
created by github.com/rueian/rueidis.(*pipe).background.func1
  /Users/fz/go/pkg/mod/github.com/rueian/rueidis@v0.0.57/pipe.go:113 +0x5a
exit status 2

In my case I had a working Redis cluster, then stopped the cluster (I am using docker-compose from https://github.com/Grokzen/docker-redis-cluster), ran it again. Newly created cluster was not properly functional due to cluster setup error, so every operation on every node at this point returned:

127.0.0.1:7005> set x 1
(error) CLUSTERDOWN Hash slot not served

I’ll try to provide a minimal reproducer example soon. This may be a bit tricky to reproduce though.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (6 by maintainers)

Most upvoted comments

Hi @FZambia,

I just tried the latest commit – and everything works as expected. Such a great progress, many thanks – will continue experimenting, the feeling is that it’s very close 😃 Also want to test Sentinel scenario - seems the last part of the migration.

Thank you for your assistance, v0.0.59 is released. Please let me know if you find any issues. I am happy to help.

Actually I’d prefer Redis to close PUB/SUB connection automatically upon slot migration, possibly with an error containing where slot was moved to instead of sending sunsubscribe

Agreed. sunsubscribe message is the only special case currently that does not fit well into the request<->response communication model: It may or may not be an out of band message. This makes the client hard to handle.

It seems that they had discussed just closing the connection from the Redis side once the slot was moved: https://github.com/redis/redis/pull/8621#issuecomment-922069361, but it just not happened finally. And I didn’t find further discussion on this.

Given at most once nature of PUB/SUB connection I am a bit worried that sunsubscribe can be lost in some cases.

https://github.com/redis/redis/blob/a64b29485d4f2359b9d698c0e21e890a212ad1bb/src/cluster.c#L5848

Well, you don’t have to worry about that currently. There is no case that the sunsubscribe can be unsent unless the client is disconnected.