go-redis: dial tcp: i/o timeout
I am using go-redis version v6.14.2. I have my application running in an AWS cluster behind loadbalancer. All redis requests failed in one of the nodes in the cluster. Rest of the nodes were working as expected. Application started working properly after a restart. We are using ElastiCache. Can you please help me with identifying the issue ?? If it is previously known issue and is solved in latest version, can you point me to that link ??
The error was “dial tcp: i/o timeout”.
Below is my cluster configuration excluding redis host address and password:
- ReadOnly : true
- RouteByLatency : true
- RouteRandomly : true
- DialTimeout : 300ms
- ReadTimeout : 30s
- Write Timeout : 30s
- PoolSize : 12000
- PoolTimeout : 32
- IdleTimeout : 120s
- IdleCheckFrequency : 1s
import (
goRedisClient "github.com/go-redis/redis"
)
func GetRedisClient() *goRedisClient.ClusterClient {
clusterClientOnce.Do(func() {
redisClusterClient = goRedisClient.NewClusterClient(
&goRedisClient.ClusterOptions{
Addrs: viper.GetStringSlice("redis.hosts"),
ReadOnly: true,
RouteByLatency: true,
RouteRandomly: true,
Password: viper.GetString("redis.password"),
DialTimeout: viper.GetDuration("redis.dial_timeout"),
ReadTimeout: viper.GetDuration("redis.read_timeout"),
WriteTimeout: viper.GetDuration("redis.write_timeout"),
PoolSize: viper.GetInt("redis.max_active_connections"),
PoolTimeout: viper.GetDuration("redis.pool_timeout"),
IdleTimeout: viper.GetDuration("redis.idle_connection_timeout"),
IdleCheckFrequency: viper.GetDuration("redis.idle_check_frequency"),
},
)
if err := redisClusterClient.Ping().Err(); err != nil {
log.WithError(err).Error(errorCreatingRedisClusterClient)
}
})
return redisClusterClient
}
As suggested in comments,https://github.com/go-redis/redis/issues/1194, I wrote the following snippet to dial and test nodes health for each slot. There were no errors. As mentioned, its happening randomly in one of the clients.Not always. It happened again after 3-4 months. And it is always fixed after a restart.
func CheckRedisSlotConnection(testCase string) {
fmt.Println(viper.GetStringSlice("redis.hosts"))
fmt.Println("Checking testcase " + testCase)
client := redis.GetRedisClient()
slots := client.ClusterSlots().Val()
addresses := []string{}
for _, slot := range slots {
for _, node := range slot.Nodes {
addresses = append(addresses, node.Addr)
}
}
fmt.Println("Received " + strconv.Itoa(len(addresses)) + " Slots")
for _, address := range addresses {
fmt.Println("Testing address " + address)
conn, err := net.DialTimeout("tcp", address, 500*time.Millisecond)
if err != nil {
fmt.Println("Error dialing to address " + address + " Error " + err.Error())
continue
}
fmt.Println("Successfully dialled to address " + address)
err = conn.Close()
if err != nil {
fmt.Println("Error closing connection " + err.Error())
continue
}
}
}
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 15 (3 by maintainers)
Thats the point. SSH-ING into the node, the client ( redis-cli ) can reach the node (address like 172.18.0.x). But at localhost these address can be reached or not? So your code may not doing what you want 😃
Adding more details, in our case, it was an issue with cpu. We were using T2 instances and this was occuring when credits hit zero. Since our instances were behind ASG, it became hard to figure out the issues after the instance goes down.