memcached: `getpeername: Transport endpoint is not connected` with Extstore

Describe the bug Hey, I have Memcached 1.6.12 deployed on AWS EC2 instances (specifically, r5d.2xlarge, more details in System Information), with Extstore enabled. Occasionally, the Memcached process would have too many open files and just hanging. When this happened, there were more than 50,000 entries under /proc/[pid]/fd/ directory. But regularity, there are only about 10,000 entires. There were also a lot of logs like this when this issue got triggered.

getpeername: Transport endpoint is not connected
accept4(): Too many open files
getpeername: Transport endpoint is not connected

To Reproduce This seems to happen randomly and hard to reproduce.

System Information

OS/Distro: Ubuntu 16.04
Version of OS/distro: xenial
Version of Memcached: 1.6.12
Hardware detail: Amazon EC2 r5d.2xlarge – 64G memory, 8CPU, 300G SSD

Additionally, I also tried it on another hardware, and the same errors were reported

OS/Distro: Ubuntu 20.04
Version of OS/distro: focal
Version of Memcached: 1.6.12
Hardware detail: Amazon EC2 r6gd.2xlarge – 64G memory, 8CPU, 474G SSD

Detail (please include!) Here’s the memcached.conf

-d

logfile /var/log/memcached.log

-m 48000

-p 11211

-u nobody

-l 0.0.0.0

-c 100000

-t 8

-I 10m

-o ext_path=/mnt/memcached/extstore:225G
-o ext_wbuf_size=16
-o ext_low_ttl=14400

-o modern

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 47 (25 by maintainers)

Most upvoted comments

Alright I’ll close this out; the upstreamed patch should be in an official version within a week or two. Thanks for your patience in narrowing this down.

If it comes back or you have further details please open a new issue but reference this one.

dormando on Nov 29, 2021