KeyDB: KeyDB constantly crashes after a day or two with error "Out-Of-Memory allocating"
TL:DR: Check kernel param: vm.max_map_count and increase it by sysctl or echo
Describe the bug
Catch out-of-memory when increase client-output-buffer-limit for replica (from 256M to 4G) and first replication start (128G was not enough). Memory was not used, fail on allocation (on graph time is 17:28 UTC+3, in logs time UTC):

I think client-output-buffer-limit is increased for all threads instead only for replication thread (server-threads is 10).
** Log Files **
2255:2520:M 14 Dec 2020 14:28:15.831 * Replica 10.101.177.33:6379 asks for synchronization
2255:2520:M 14 Dec 2020 14:28:15.831 * Full resync requested by replica 10.101.177.33:6379
2255:2520:M 14 Dec 2020 14:28:15.831 * Starting BGSAVE for SYNC with target: disk
2255:2520:M 14 Dec 2020 14:28:16.291 * Background saving started by pid 22010
22010:2520:C 14 Dec 2020 14:28:16.328 # Out Of Memory allocating 136273 bytes!
=== KEYDB BUG REPORT START: Cut & paste starting from here ===
2255:2516:M 14 Dec 2020 14:28:16.583 # ------------------------------------------------
2255:2516:M 14 Dec 2020 14:28:16.583 # !!! Software Failure. Press left mouse button to continue
2255:2516:M 14 Dec 2020 14:28:16.583 # Guru Meditation: Redis aborting for OUT OF MEMORY #server.cpp:5206
2255:2516:M 14 Dec 2020 14:28:16.583 # (forcing SIGSEGV in order to print the stack trace)
2255:2516:M 14 Dec 2020 14:28:16.583 # ------------------------------------------------
2255:2516:M 14 Dec 2020 14:28:16.584 # KeyDB 6.0.16 crashed by signal: 11
2255:2517:M 14 Dec 2020 14:28:16.584 # === ASSERTION FAILED ===
2255:2516:M 14 Dec 2020 14:28:16.584 # Crashed running the instruction at: 0x562226d1abaf
2255:2517:M 14 Dec 2020 14:28:16.584 # ==> tls.cpp:920 '!GlobalLocksAcquired()' is not true
2255:2522:M 14 Dec 2020 14:28:16.584 # === ASSERTION FAILED ===
2255:2522:M 14 Dec 2020 14:28:16.584 # ==> tls.cpp:920 '!GlobalLocksAcquired()' is not true
2255:2516:M 14 Dec 2020 14:28:16.584 # Accessing address: 0xffffffffffffffff
2255:2517:M 14 Dec 2020 14:28:16.584 # (forcing SIGSEGV to print the bug report.)
2255:2520:M 14 Dec 2020 14:28:16.584 # === ASSERTION FAILED ===
2255:2522:M 14 Dec 2020 14:28:16.584 # (forcing SIGSEGV to print the bug report.)
2255:2516:M 14 Dec 2020 14:28:16.584 # Failed assertion: !GlobalLocksAcquired() (tls.cpp:920)
2255:2515:M 14 Dec 2020 14:28:16.584 # === ASSERTION FAILED ===
=== KEYDB BUG REPORT START: Cut & paste starting from here ===
22010:2520:C 14 Dec 2020 14:28:16.328 # ------------------------------------------------
22010:2520:C 14 Dec 2020 14:28:16.328 # !!! Software Failure. Press left mouse button to continue
22010:2520:C 14 Dec 2020 14:28:16.329 # Guru Meditation: Redis aborting for OUT OF MEMORY #server.cpp:5206
22010:2520:C 14 Dec 2020 14:28:16.329 # (forcing SIGSEGV in order to print the stack trace)
22010:2520:C 14 Dec 2020 14:28:16.329 # ------------------------------------------------
22010:2520:C 14 Dec 2020 14:28:16.329 # KeyDB 6.0.16 crashed by signal: 11
22010:2520:C 14 Dec 2020 14:28:16.329 # Crashed running the instruction at: 0x562226d1abaf
22010:2520:C 14 Dec 2020 14:28:16.329 # Accessing address: 0xffffffffffffffff
22010:2520:C 14 Dec 2020 14:28:16.329 # Failed assertion: <no assertion failed> (<no file>:0)
------ STACK TRACE ------
EIP:
keydb-rdb-bgsave 0.0.0.0:6379(_serverPanic+0x13f) [0x562226d1abaf]
Backtrace:
keydb-rdb-bgsave 0.0.0.0:6379(logStackTrace(ucontext*)+0x45) [0x562226d1bda5]
keydb-rdb-bgsave 0.0.0.0:6379(sigsegvHandler(int, siginfo_t*, void*)+0xdb) [0x562226d1c56b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0) [0x7f46063150c0]
keydb-rdb-bgsave 0.0.0.0:6379(_serverPanic+0x13f) [0x562226d1abaf]
keydb-rdb-bgsave 0.0.0.0:6379(redisOutOfMemoryHandler(unsigned long)+0x34) [0x562226cbd684]
keydb-rdb-bgsave 0.0.0.0:6379(zmalloc+0x49) [0x562226cca029]
keydb-rdb-bgsave 0.0.0.0:6379(+0x8a089) [0x562226cf6089]
keydb-rdb-bgsave 0.0.0.0:6379(rdbSaveRawString(_rio*, unsigned char const*, unsigned long)+0x54) [0x562226cf6374]
keydb-rdb-bgsave 0.0.0.0:6379(rdbSaveObject(_rio*, robj_roptr, redisObject*)+0x27d) [0x562226cf748d]
keydb-rdb-bgsave 0.0.0.0:6379(rdbSaveKeyValuePair(_rio*, redisObject*, redisObject*, expireEntry*)+0x104) [0x562226cf7f54]
keydb-rdb-bgsave 0.0.0.0:6379(saveKey(_rio*, redisDb*, int, unsigned long*, char const*, redisObject*)+0x7b) [0x562226cf83cb]
keydb-rdb-bgsave 0.0.0.0:6379(rdbSaveRio(_rio*, int*, int, rdbSaveInfo*)+0x1fe) [0x562226cf876e]
keydb-rdb-bgsave 0.0.0.0:6379(rdbSaveFile(char*, rdbSaveInfo*)+0xae) [0x562226cf93ae]
keydb-rdb-bgsave 0.0.0.0:6379(rdbSave(rdbSaveInfo*)+0x1f) [0x562226cf95ef]
keydb-rdb-bgsave 0.0.0.0:6379(rdbSaveBackground(rdbSaveInfo*)+0xc7) [0x562226cf96e7]
keydb-rdb-bgsave 0.0.0.0:6379(startBgsaveForReplication(int)+0x15b) [0x562226cefa8b]
keydb-rdb-bgsave 0.0.0.0:6379(syncCommand(client*)+0x34b) [0x562226cf000b]
keydb-rdb-bgsave 0.0.0.0:6379(call(client*, int)+0xd0) [0x562226cc0b70]
keydb-rdb-bgsave 0.0.0.0:6379(processCommand(client*, int)+0x99d) [0x562226cc537d]
keydb-rdb-bgsave 0.0.0.0:6379(processCommandAndResetClient(client*, int)+0x31) [0x562226cd0391]
keydb-rdb-bgsave 0.0.0.0:6379(processInputBuffer(client*, int)+0x18b) [0x562226cd638b]
keydb-rdb-bgsave 0.0.0.0:6379(processClients()+0xac) [0x562226cd913c]
keydb-rdb-bgsave 0.0.0.0:6379(beforeSleep(aeEventLoop*)+0x85) [0x562226cbcab5]
keydb-rdb-bgsave 0.0.0.0:6379(aeMain+0x8e) [0x562226cb909e]
keydb-rdb-bgsave 0.0.0.0:6379(workerThreadMain(void*)+0x74) [0x562226cbf9b4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7f460630b494]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f460604dacf]
------ INFO OUTPUT ------
# Server
redis_version:6.0.16
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:fa6f9f5195421066
redis_mode:standalone
os:Linux 4.15.18-30-pve x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:6.3.0
process_id:22010
run_id:1c7366c8adf8f766caa610f84df753aeb453119a
tcp_port:6379
uptime_in_seconds:4524
uptime_in_days:0
hz:40
configured_hz:10
lru_clock:14120575
executable:/usr/bin/keydb-server
config_file:/etc/keydb/keydb.conf
# Clients
connected_clients:7278
client_recent_max_input_buffer:545
client_recent_max_output_buffer:1328
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:7
thread_0_clients:9
thread_1_clients:808
thread_2_clients:808
thread_3_clients:808
thread_4_clients:808
thread_5_clients:808
thread_6_clients:808
thread_7_clients:808
thread_8_clients:807
thread_9_clients:807
# Memory
used_memory:25373233496
used_memory_human:23.63G
used_memory_rss:26411708416
used_memory_rss_human:24.60G
used_memory_peak:29569537744
used_memory_peak_human:27.54G
used_memory_peak_perc:85.81%
used_memory_overhead:1507750217
used_memory_startup:47707832
used_memory_dataset:23865483279
used_memory_dataset_perc:94.23%
allocator_allocated:25376787616
allocator_active:26007588864
allocator_resident:26552078336
total_system_memory:2164424249344
total_system_memory_human:1.97T
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.02
allocator_frag_bytes:630801248
allocator_rss_ratio:1.02
allocator_rss_bytes:544489472
rss_overhead_ratio:0.99
rss_overhead_bytes:-140369920
mem_fragmentation_ratio:1.04
mem_fragmentation_bytes:1036276120
mem_not_counted_for_evict:0
mem_replication_backlog:1048576
mem_clients_slaves:0
mem_clients_normal:124998513
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
# Persistence
loading:0
rdb_changes_since_last_save:2813934
rdb_bgsave_in_progress:0
rdb_last_save_time:1607955893
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:92
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:2113265664
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:122310
total_commands_processed:226884172
instantaneous_ops_per_sec:58484
total_net_input_bytes:25473952379
total_net_output_bytes:29151146576
instantaneous_input_kbps:7513.90
instantaneous_output_kbps:7576.85
rejected_connections:0
sync_full:3
sync_partial_ok:0
sync_partial_err:0
expired_keys:51128
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:108817603
keyspace_misses:41223206
pubsub_channels:594897
pubsub_patterns:0
latest_fork_usec:401288
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
# Replication
role:master
connected_slaves:1
slave0:ip=10.101.177.33,port=6379,state=wait_bgsave,offset=0,lag=0
master_replid:76278f29c523bf4f9d8e11965881fe33003d4a58
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1427807698
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1426759123
repl_backlog_histlen:1048576
# CPU
used_cpu_sys:0.022219
used_cpu_user:0.022219
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
server_threads:10
long_lock_waits:260882
# Modules
# Commandstats
cmdstat_exists:calls=8793662,usec=21996436,usec_per_call=2.50
cmdstat_subscribe:calls=4406812,usec=22211689,usec_per_call=5.04
cmdstat_exec:calls=138247,usec=2674307,usec_per_call=19.34
cmdstat_lpop:calls=12217242,usec=42301454,usec_per_call=3.46
cmdstat_hgetall:calls=14208513,usec=54739427,usec_per_call=3.85
cmdstat_hkeys:calls=58705,usec=468757,usec_per_call=7.98
cmdstat_scard:calls=611,usec=3082,usec_per_call=5.04
cmdstat_del:calls=153678,usec=471467,usec_per_call=3.07
cmdstat_command:calls=3,usec=2803,usec_per_call=934.33
cmdstat_hdel:calls=1131369,usec=1520987,usec_per_call=1.34
cmdstat_pubsub:calls=7,usec=1515095,usec_per_call=216442.14
cmdstat_multi:calls=138247,usec=70945,usec_per_call=0.51
cmdstat_select:calls=19841,usec=13249,usec_per_call=0.67
cmdstat_mget:calls=76,usec=1468,usec_per_call=19.32
cmdstat_srem:calls=254,usec=1718,usec_per_call=6.76
cmdstat_sismember:calls=113174,usec=255673,usec_per_call=2.26
cmdstat_unlink:calls=510441,usec=4686279,usec_per_call=9.18
cmdstat_ttl:calls=27,usec=53,usec_per_call=1.96
cmdstat_hmset:calls=6826921,usec=57352132,usec_per_call=8.40
cmdstat_watch:calls=123107,usec=251099,usec_per_call=2.04
cmdstat_lrange:calls=2959196,usec=24283397,usec_per_call=8.21
cmdstat_mset:calls=7,usec=149,usec_per_call=21.29
cmdstat_info:calls=5462,usec=690395,usec_per_call=126.40
cmdstat_replconf:calls=9,usec=145,usec_per_call=16.11
cmdstat_incrby:calls=49,usec=160,usec_per_call=3.27
cmdstat_config:calls=3,usec=318,usec_per_call=106.00
cmdstat_ping:calls=9420003,usec=3624804,usec_per_call=0.38
cmdstat_sadd:calls=209539,usec=1062891,usec_per_call=5.07
cmdstat_publish:calls=3682537,usec=6505088,usec_per_call=1.77
cmdstat_unsubscribe:calls=3380303,usec=19529656,usec_per_call=5.78
cmdstat_incr:calls=5142,usec=27185,usec_per_call=5.29
cmdstat_psync:calls=2,usec=821934,usec_per_call=410967.00
cmdstat_set:calls=32657729,usec=83367999,usec_per_call=2.55
cmdstat_smembers:calls=8821513,usec=42199781,usec_per_call=4.78
cmdstat_unwatch:calls=123083,usec=264507,usec_per_call=2.15
cmdstat_get:calls=93917154,usec=322764859,usec_per_call=3.44
cmdstat_hget:calls=17052309,usec=46125321,usec_per_call=2.70
cmdstat_expire:calls=143490,usec=474296,usec_per_call=3.31
cmdstat_hmget:calls=4115717,usec=16314125,usec_per_call=3.96
cmdstat_hset:calls=101,usec=592,usec_per_call=5.86
cmdstat_hincrby:calls=152849,usec=593122,usec_per_call=3.88
cmdstat_lpush:calls=1397038,usec=5612104,usec_per_call=4.02
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=26149461,expires=909900,avg_ttl=51620179
db1:keys=52,expires=0,avg_ttl=0
db3:keys=41856,expires=0,avg_ttl=0
db5:keys=17415,expires=0,avg_ttl=0
------ CLIENT LIST OUTPUT ------
22010:2520:C 14 Dec 2020 14:28:16.345 # Out Of Memory allocating 1455810 bytes!
22010:2520:C 14 Dec 2020 14:28:16.345 # ------------------------------------------------
22010:2520:C 14 Dec 2020 14:28:16.345 # !!! Software Failure. Press left mouse button to continue
22010:2520:C 14 Dec 2020 14:28:16.345 # Guru Meditation: Redis aborting for OUT OF MEMORY #server.cpp:5206
22010:2520:C 14 Dec 2020 14:28:16.345 # (forcing SIGSEGV in order to print the stack trace)
22010:2520:C 14 Dec 2020 14:28:16.345 # ------------------------------------------------
2255:2516:M 14 Dec 2020 14:28:16.574 # Out Of Memory allocating 32774 bytes!
To Reproduce
127.0.0.1:6379> config get client-output-buffer-limit
1) "client-output-buffer-limit"
2) "normal 0 0 0 slave 268435456 67108864 60 pubsub 33554432 8388608 60"
127.0.0.1:6379> config set client-output-buffer-limit "slave 4194990176 4194990176 0"
OK
127.0.0.1:6379> config get client-output-buffer-limit
1) "client-output-buffer-limit"
2) "normal 0 0 0 slave 4194990176 4194990176 0 pubsub 33554432 8388608 60"
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 31 (8 by maintainers)
@rnz I’ve created a .deb package of v6.0.16 with libc malloc instead of jmalloc, feel free to try it out. We also intend to update jemalloc to 5.2, which will hopefully fix the mapped memory deallocation issue.
keydb-tools_6.0.16-1+deb9u1_amd64.zip
Hi @rnz We’re investigating now.