ClickHouse: Suspected memory leak in 23.8

Migrated from https://github.com/ClickHouse/ClickHouse/issues/54483 as a separate issue, since we are not using Kafka.

Describe the unexpected behaviour

Using ClickHouse 23.8.2.7. Memory increases with database usage, apparently a memory leak. After restarting memory is reduced drastically, then starts climbing up again at around 300 MB / day.

image

How to reproduce

  • Which ClickHouse server version to use: ClickHouse 23.8.2.7
  • One particular database has a somewhat non-standard usage: it receives a lot of inserts but they usually last for less than a minute, it is a sort of internal cache. There we see memory usage raising at around 700 MB / day. But we see the same effect in other ClickHouse servers with the same version, less pronounced (300 MB / day). Memory is always released upon restart.

Expected behavior

Since no new data is being stored we would expect that the database should recycle memory internally without having to restart it.

Additional context Add any other context about the problem here.

All queries run on a particular instance that is using ~19 GB right now. Merges, as requested by @antaljanosbenjamin:

SELECT
    formatReadableSize(memory_usage),
    *
FROM system.merges

Ok.

0 rows in set. Elapsed: 0.001 sec. 

Memory stats, as requested by @azat:

WITH
    (
        SELECT sum(CAST(value, 'UInt64'))
        FROM system.metrics
        WHERE metric = 'MemoryTracking'
    ) AS memory_tracked_,
    (
        SELECT sum(total_bytes)
        FROM system.tables
        WHERE engine IN ('Join', 'Memory', 'Buffer', 'Set')
    ) AS memory_tables_,
    (
        SELECT sum(CAST(value, 'UInt64'))
        FROM system.asynchronous_metrics
        WHERE metric LIKE '%CacheBytes'
    ) AS memory_caches_,
    (
        SELECT sum(CAST(memory_usage, 'UInt64'))
        FROM system.processes
    ) AS memory_processes_,
    (
        SELECT sum(CAST(memory_usage, 'UInt64'))
        FROM system.merges
    ) AS memory_merges_,
    (
        SELECT sum(bytes_allocated)
        FROM system.dictionaries
    ) AS memory_dictionaries_,
    (
        SELECT sum(primary_key_bytes_in_memory_allocated)
        FROM system.parts
    ) AS memory_primary_keys_
SELECT
    assumeNotNull(memory_tracked_) AS memory_tracked,
    assumeNotNull(memory_tables_) AS memory_tables,
    assumeNotNull(memory_caches_) AS memory_caches,
    assumeNotNull(memory_processes_) AS memory_processes,
    assumeNotNull(memory_merges_) AS memory_merges,
    assumeNotNull(memory_dictionaries_) AS memory_dictionaries,
    assumeNotNull(memory_primary_keys_) AS memory_primary_keys
FORMAT Vertical

Row 1:
──────
memory_tracked:      19684595772
memory_tables:       0
memory_caches:       4538334048
memory_processes:    175800
memory_merges:       0
memory_dictionaries: 0
memory_primary_keys: 2427744

and more:

SELECT
    metric,
    value
FROM system.asynchronous_metrics
WHERE metric LIKE '%jemalloc%'

┌─metric───────────────────────────────────┬────────value─┐
│ jemalloc.epoch                           │      1507330 │
│ jemalloc.arenas.all.pactive              │      4466147 │
│ jemalloc.background_thread.num_runs      │            0 │
│ jemalloc.active                          │  18293338112 │
│ jemalloc.retained                        │ 286359556096 │
│ jemalloc.arenas.all.pmuzzy               │         3687 │
│ jemalloc.allocated                       │  18102287104 │
│ jemalloc.background_thread.num_threads   │            0 │
│ jemalloc.metadata_thp                    │            0 │
│ jemalloc.arenas.all.muzzy_purged         │   6963334478 │
│ jemalloc.arenas.all.dirty_purged         │  11647316965 │
│ jemalloc.mapped                          │  19758575616 │
│ jemalloc.metadata                        │   1241636848 │
│ jemalloc.resident                        │  19578085376 │
│ jemalloc.background_thread.run_intervals │            0 │
│ jemalloc.arenas.all.pdirty               │        14069 │
└──────────────────────────────────────────┴──────────────┘

Memory allocated:

SELECT
    *,
    allocations - deallocations AS active_allocations,
    size * active_allocations AS allocated_bytes
FROM system.jemalloc_bins
WHERE allocated_bytes > 0
ORDER BY allocated_bytes DESC

┌─index─┬─large─┬──────size─┬─allocations─┬─deallocations─┬─active_allocations─┬─allocated_bytes─┐
│    12 │     0 │       256 │  1964164752 │    1934402605 │           29762147 │      7619109632 │
│     4 │     0 │        64 │  8765929776 │    8725048938 │           40880838 │      2616373632 │
│     5 │     0 │        80 │  2965546952 │    2933024532 │           32522420 │      2601793600 │
│    11 │     0 │       224 │   850928924 │     839491604 │           11437320 │      2561959680 │
│     2 │     0 │        32 │  7006726704 │    6965106953 │           41619751 │      1331832032 │
│    91 │     1 │ 234881024 │           1 │             0 │                  1 │       234881024 │
│     6 │     0 │        96 │  5682488087 │    5680410878 │            2077209 │       199412064 │
│     7 │     0 │       112 │  1402413732 │    1401220491 │            1193241 │       133642992 │
[...]

Please let us know what else we can do to diagnose the issue. Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Reactions: 13
  • Comments: 22 (5 by maintainers)

Most upvoted comments

Hi! I faced with Memory Leak after upgrade ClickHouse to 23.8 version too.

Current version: 23.8.9.54 Shards: 1 Replicas: 2 Using kafka engine: No

I have 2 replicas (node1 and node2) and there is no client queries to node2 at all. But after the upgrading to 23.8 version, the both nodes started to leak memory similarly. Memory leak is going slow, but inevitable: after 30 days the memory on a node is over.

This is memory consumption on the node2 (with no client queries): clickhouse

Please, tell me what additional information may I provide you about the issue 🙏

Thanks, @fishday53! Looks like it could be related to what we have observed. Let’s see if we can find the culprit.

@jrdi, Hi! Previous version was: 22.2.2.1

Here results for create and drop tables:

SELECT
    event_date,
    count(*)
FROM query_log
WHERE query LIKE '%CREATE TABLE%'
GROUP BY event_date

Query id: 2ac4d3aa-10e0-4fb5-8d08-27677a93516a

┌─event_date─┬─count()─┐
│ 2023-12-13 │   37863 │
│ 2023-12-14 │   38296 │
│ 2023-12-15 │   39576 │
│ 2023-12-16 │   39592 │
│ 2023-12-17 │   39432 │
│ 2023-12-18 │   38872 │
│ 2023-12-19 │   39432 │
│ 2023-12-20 │   38304 │
│ 2023-12-21 │   39224 │
│ 2023-12-22 │   39352 │
│ 2023-12-23 │   39784 │
│ 2023-12-24 │   39544 │
│ 2023-12-25 │   39232 │
│ 2023-12-26 │   38832 │
│ 2023-12-27 │   39112 │
│ 2023-12-28 │   39424 │
│ 2023-12-29 │   39288 │
│ 2023-12-30 │   39568 │
│ 2023-12-31 │   39272 │
│ 2024-01-01 │   39120 │
│ 2024-01-02 │   38960 │
│ 2024-01-03 │   35300 │
│ 2024-01-04 │   39600 │
│ 2024-01-05 │   39664 │
│ 2024-01-06 │   39672 │
│ 2024-01-07 │   39104 │
│ 2024-01-08 │   39032 │
│ 2024-01-09 │   38640 │
│ 2024-01-10 │   39216 │
│ 2024-01-11 │   19058 │
└────────────┴─────────┘

SELECT
    event_date,
    count(*)
FROM query_log
WHERE query LIKE '%DROP TABLE%'
GROUP BY event_date

Query id: 18e4f2e2-c36a-460a-b454-bb213e7f4cc9

┌─event_date─┬─count()─┐
│ 2023-12-13 │   64280 │
│ 2023-12-14 │   65375 │
│ 2023-12-15 │   67920 │
│ 2023-12-16 │   67944 │
│ 2023-12-17 │   67620 │
│ 2023-12-18 │   66584 │
│ 2023-12-19 │   67656 │
│ 2023-12-20 │   65680 │
│ 2023-12-21 │   67244 │
│ 2023-12-22 │   67476 │
│ 2023-12-23 │   68280 │
│ 2023-12-24 │   67872 │
│ 2023-12-25 │   67220 │
│ 2023-12-26 │   66292 │
│ 2023-12-27 │   66936 │
│ 2023-12-28 │   67616 │
│ 2023-12-29 │   67224 │
│ 2023-12-30 │   67852 │
│ 2023-12-31 │   67376 │
│ 2024-01-01 │   67188 │
│ 2024-01-02 │   66776 │
│ 2024-01-03 │   59456 │
│ 2024-01-04 │   67968 │
│ 2024-01-05 │   68076 │
│ 2024-01-06 │   68096 │
│ 2024-01-07 │   66952 │
│ 2024-01-08 │   66764 │
│ 2024-01-09 │   65924 │
│ 2024-01-10 │   67176 │
│ 2024-01-11 │   32554 │
└────────────┴─────────┘