memcached: OOM at 2TB with plenty of memory left

Re-posting here from #467 at the request of @dormando

I’m experiencing an OOM with memcached 1.5.16. My workload is very contrived, but I would like to try understand what I’m seeing.

I have a 4TB in-memory memcached instance. My goal is to exercise the OS memory management implementation (even if the user-space workload is not very realistic). I started with a simple workload: I do SET operations with keys that are sequential integers (turned into strings) and values that are 512KB of zeros (to be precise, I am starting memcached with -f 1.11 and inserting keys of size 523800 bytes, so there are 2 chunks per page and almost no memory wasted). These are the only operations done on the server, and I am using only a single client thread.

memcached is given the entire 4TB of RAM. However, after inserting only 2TB of data, I get an OOM response from the server.

Any thoughts or insights would be helpful.

Here is the output of memcached tool:

$ ./memcached-tool localhost
  #  Item_Size  Max_age   Pages   Count   Full?  Evicted Evict_Time OOM
 63   512.0K    153466s 2127649 4255298     yes        0        0    1

$ ./memcached-tool localhost stats
#localhost:11211   Field       Value
         accepting_conns           1
               auth_cmds           0
             auth_errors           0
                   bytes 2229204830958
              bytes_read 2229090461784
           bytes_written   102128520
              cas_badval           0
                cas_hits           0
              cas_misses           0
               cmd_flush           0
                 cmd_get           0
                 cmd_set     4255298
               cmd_touch           0
             conn_yields           0
   connection_structures           3
   crawler_items_checked   220678182
       crawler_reclaimed           0
        curr_connections           2
              curr_items     4255298
               decr_hits           0
             decr_misses           0
             delete_hits           0
           delete_misses           0
         direct_reclaims          10
          evicted_active           0
       evicted_unfetched           0
               evictions           0
       expired_unfetched           0
             get_expired           0
             get_flushed           0
                get_hits           0
              get_misses           0
              hash_bytes    33554432
       hash_is_expanding           0
        hash_power_level          22
               incr_hits           0
             incr_misses           0
                libevent 2.0.21-stable
          limit_maxbytes 4315368390656
     listen_disabled_num           0
        log_watcher_sent           0
     log_watcher_skipped           0
      log_worker_dropped           0
      log_worker_written           0
       lru_bumps_dropped           0
     lru_crawler_running           0
      lru_crawler_starts       17595
  lru_maintainer_juggles      830689
       lrutail_reflocked           0
            malloc_fails           0
         max_connections        1024
           moves_to_cold     4255298
           moves_to_warm           0
        moves_within_lru           0
                     pid       20873
            pointer_size          64
               reclaimed           0
    rejected_connections           0
            reserved_fds          20
           rusage_system 10764.436253
             rusage_user 8789.971169
   slab_global_page_pool           0
slab_reassign_busy_deletes           0
slab_reassign_busy_items           0
slab_reassign_chunk_rescues           0
slab_reassign_evictions_nomem           0
slab_reassign_inline_reclaim           0
   slab_reassign_rescues           0
   slab_reassign_running           0
             slabs_moved           0
                 threads           4
                    time  1564415652
time_in_listen_disabled_us           0
       total_connections           5
             total_items     4255298
              touch_hits           0
            touch_misses           0
                  uptime      153469
                 version      1.5.16

$ ./memcached-tool localhost settings
#localhost:11211   Field       Value
      auth_enabled_ascii          no
       auth_enabled_sasl          no
        binding_protocol auto-negotiate
             cas_enabled         yes
              chunk_size          48
          detail_enabled          no
           domain_socket        NULL
            dump_enabled         yes
               evictions         off
           flush_enabled         yes
           growth_factor        1.11
          hash_algorithm     murmur3
          hashpower_init           0
             hot_lru_pct          20
          hot_max_factor        0.20
            idle_timeout           0
   inline_ascii_response          no
                   inter        NULL
           item_size_max     1048576
             lru_crawler         yes
       lru_crawler_sleep         100
     lru_crawler_tocrawl           0
   lru_maintainer_thread         yes
           lru_segmented         yes
                maxbytes 4315368390656
                maxconns        1024
           maxconns_fast         yes
             num_threads           4
     num_threads_per_udp           4
                  oldest           0
          reqs_per_event          20
           slab_automove           1
     slab_automove_ratio        0.80
    slab_automove_window          30
          slab_chunk_max      524288
           slab_reassign         yes
         stat_key_prefix           :
        tail_repair_time           0
             tcp_backlog        1024
                 tcpport       11211
                temp_lru          no
           temporary_ttl          61
             track_sizes          no
                 udpport           0
                   umask         700
               verbosity           0
            warm_lru_pct          40
         warm_max_factor        2.00
     watcher_logbuf_size      262144
      worker_logbuf_size       65536

$ telnet localhost 11211 
Trying ::1...
Connected to localhost.
Escape character is '^]'.
stats items
STAT items:63:number 4255298
STAT items:63:number_hot 0
STAT items:63:number_warm 0
STAT items:63:number_cold 4255298
STAT items:63:age_hot 0
STAT items:63:age_warm 0
STAT items:63:age 160382
STAT items:63:evicted 0
STAT items:63:evicted_nonzero 0
STAT items:63:evicted_time 0
STAT items:63:outofmemory 1
STAT items:63:tailrepairs 0
STAT items:63:reclaimed 0
STAT items:63:expired_unfetched 0
STAT items:63:evicted_unfetched 0
STAT items:63:evicted_active 0
STAT items:63:crawler_reclaimed 0
STAT items:63:crawler_items_checked 229188778
STAT items:63:lrutail_reflocked 0
STAT items:63:moves_to_cold 4255298
STAT items:63:moves_to_warm 0
STAT items:63:moves_within_lru 0
STAT items:63:direct_reclaims 10
STAT items:63:hits_to_hot 0
STAT items:63:hits_to_warm 0
STAT items:63:hits_to_cold 0
STAT items:63:hits_to_temp 0
END
stats slabs
STAT 63:chunk_size 524288
STAT 63:chunks_per_page 2
STAT 63:total_pages 2127649
STAT 63:total_chunks 4255298
STAT 63:used_chunks 4255298
STAT 63:free_chunks 0
STAT 63:free_chunks_end 0
STAT 63:mem_requested 2229204830958
STAT 63:get_hits 0
STAT 63:cmd_set 4255298
STAT 63:delete_hits 0
STAT 63:incr_hits 0
STAT 63:decr_hits 0
STAT 63:cas_hits 0
STAT 63:cas_badval 0
STAT 63:touch_hits 0
STAT active_slabs 1
STAT total_malloced 2231012163584
END
^]
telnet> Connection closed.

$ cat /proc/meminfo 
MemTotal:       4227745440 kB
MemFree:        2038698904 kB
MemAvailable:   2027310892 kB
Buffers:            2068 kB
Cached:          4508928 kB
SwapCached:            0 kB
Active:         2179503256 kB
Inactive:        4243620 kB
Active(anon):   2179373708 kB
Inactive(anon):  4065076 kB
Active(file):     129548 kB
Inactive(file):   178544 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2097148 kB
SwapFree:        2097148 kB
Dirty:              9744 kB
Writeback:             0 kB
AnonPages:      2179235824 kB
Mapped:           135476 kB
Shmem:           4202980 kB
KReclaimable:      50356 kB
Slab:             111892 kB
SReclaimable:      50356 kB
SUnreclaim:        61536 kB
KernelStack:        1984 kB
PageTables:      4330560 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    2115969868 kB
Committed_AS:   66059668 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:              208 kB
HardwareCorrupted:     0 kB
AnonHugePages:  14667776 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      460644 kB
DirectMap2M:    4294506496 kB

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (9 by maintainers)

Most upvoted comments

How about the following:

On Linux, the kernel limits the number of distinct mmaped regions a process may have. By default, the limit is 65536 regions. This limit can cause memcached to fail to allocate space on very-large-memory machines (multiple terabytes) for values, even if it is using less space than allowed via the -m flag, because some libc malloc implementations falls back to mmap when sbrk fails. The setting can be relaxed with sudo sysctl -w vm.max_map_count=<some large number>. See https://github.com/memcached/memcached/issues/512 for details.

You can break on the same code and walk through a successful allocation or three (should be every other alloc given your pattern) to see when it works.

Hmm… makes sense. I was just curious if memcached does any defragmentation or compression or something.

It does once it’s got its memory malloc’ed from the kernel. It holds on to everything and will shuffle/balance memory between slab classes… evict LRU’ed items to make space, etc. but it has to get the malloc’s done first. -L may work… and the branch I’m working on now uses a single mmap for item space (restartable cache), but it’s not quite working yet. You could try that branch if you don’t intend to actually restart it properly, it’ll allocate out of mmap just fine so long as you have a large enough tmpfs mount. That’s how I did testing for intel’s optane memory.

Seems like linux may struggle to shuffle memory around sometimes.

This would be hidden from userspace. kswapd or kcompactd would be invoked, blocking userspace from running altogether.

One would assume so, but it’s clearly failing to malloc inconsistently here. So it’s doing something.

Also… is this DRAM or intel DCPMM in memory mode? that is a ridiculous amount of RAM.

Absolutely It is actually a simulation of a very large system that I’m doing for a research project. FWIW, though, you can actually get 4TB instances on AWS and GCP at the premium of about $25/hour

Gotcha, heh. Good luck.