ClickHouse: Suspected memory leak in 23.8
Migrated from https://github.com/ClickHouse/ClickHouse/issues/54483 as a separate issue, since we are not using Kafka.
Describe the unexpected behaviour
Using ClickHouse 23.8.2.7. Memory increases with database usage, apparently a memory leak. After restarting memory is reduced drastically, then starts climbing up again at around 300 MB / day.
How to reproduce
- Which ClickHouse server version to use: ClickHouse 23.8.2.7
- One particular database has a somewhat non-standard usage: it receives a lot of inserts but they usually last for less than a minute, it is a sort of internal cache. There we see memory usage raising at around 700 MB / day. But we see the same effect in other ClickHouse servers with the same version, less pronounced (300 MB / day). Memory is always released upon restart.
Expected behavior
Since no new data is being stored we would expect that the database should recycle memory internally without having to restart it.
Additional context Add any other context about the problem here.
All queries run on a particular instance that is using ~19 GB right now. Merges, as requested by @antaljanosbenjamin:
SELECT
formatReadableSize(memory_usage),
*
FROM system.merges
Ok.
0 rows in set. Elapsed: 0.001 sec.
Memory stats, as requested by @azat:
WITH
(
SELECT sum(CAST(value, 'UInt64'))
FROM system.metrics
WHERE metric = 'MemoryTracking'
) AS memory_tracked_,
(
SELECT sum(total_bytes)
FROM system.tables
WHERE engine IN ('Join', 'Memory', 'Buffer', 'Set')
) AS memory_tables_,
(
SELECT sum(CAST(value, 'UInt64'))
FROM system.asynchronous_metrics
WHERE metric LIKE '%CacheBytes'
) AS memory_caches_,
(
SELECT sum(CAST(memory_usage, 'UInt64'))
FROM system.processes
) AS memory_processes_,
(
SELECT sum(CAST(memory_usage, 'UInt64'))
FROM system.merges
) AS memory_merges_,
(
SELECT sum(bytes_allocated)
FROM system.dictionaries
) AS memory_dictionaries_,
(
SELECT sum(primary_key_bytes_in_memory_allocated)
FROM system.parts
) AS memory_primary_keys_
SELECT
assumeNotNull(memory_tracked_) AS memory_tracked,
assumeNotNull(memory_tables_) AS memory_tables,
assumeNotNull(memory_caches_) AS memory_caches,
assumeNotNull(memory_processes_) AS memory_processes,
assumeNotNull(memory_merges_) AS memory_merges,
assumeNotNull(memory_dictionaries_) AS memory_dictionaries,
assumeNotNull(memory_primary_keys_) AS memory_primary_keys
FORMAT Vertical
Row 1:
──────
memory_tracked: 19684595772
memory_tables: 0
memory_caches: 4538334048
memory_processes: 175800
memory_merges: 0
memory_dictionaries: 0
memory_primary_keys: 2427744
and more:
SELECT
metric,
value
FROM system.asynchronous_metrics
WHERE metric LIKE '%jemalloc%'
┌─metric───────────────────────────────────┬────────value─┐
│ jemalloc.epoch │ 1507330 │
│ jemalloc.arenas.all.pactive │ 4466147 │
│ jemalloc.background_thread.num_runs │ 0 │
│ jemalloc.active │ 18293338112 │
│ jemalloc.retained │ 286359556096 │
│ jemalloc.arenas.all.pmuzzy │ 3687 │
│ jemalloc.allocated │ 18102287104 │
│ jemalloc.background_thread.num_threads │ 0 │
│ jemalloc.metadata_thp │ 0 │
│ jemalloc.arenas.all.muzzy_purged │ 6963334478 │
│ jemalloc.arenas.all.dirty_purged │ 11647316965 │
│ jemalloc.mapped │ 19758575616 │
│ jemalloc.metadata │ 1241636848 │
│ jemalloc.resident │ 19578085376 │
│ jemalloc.background_thread.run_intervals │ 0 │
│ jemalloc.arenas.all.pdirty │ 14069 │
└──────────────────────────────────────────┴──────────────┘
Memory allocated:
SELECT
*,
allocations - deallocations AS active_allocations,
size * active_allocations AS allocated_bytes
FROM system.jemalloc_bins
WHERE allocated_bytes > 0
ORDER BY allocated_bytes DESC
┌─index─┬─large─┬──────size─┬─allocations─┬─deallocations─┬─active_allocations─┬─allocated_bytes─┐
│ 12 │ 0 │ 256 │ 1964164752 │ 1934402605 │ 29762147 │ 7619109632 │
│ 4 │ 0 │ 64 │ 8765929776 │ 8725048938 │ 40880838 │ 2616373632 │
│ 5 │ 0 │ 80 │ 2965546952 │ 2933024532 │ 32522420 │ 2601793600 │
│ 11 │ 0 │ 224 │ 850928924 │ 839491604 │ 11437320 │ 2561959680 │
│ 2 │ 0 │ 32 │ 7006726704 │ 6965106953 │ 41619751 │ 1331832032 │
│ 91 │ 1 │ 234881024 │ 1 │ 0 │ 1 │ 234881024 │
│ 6 │ 0 │ 96 │ 5682488087 │ 5680410878 │ 2077209 │ 199412064 │
│ 7 │ 0 │ 112 │ 1402413732 │ 1401220491 │ 1193241 │ 133642992 │
[...]
Please let us know what else we can do to diagnose the issue. Thanks!
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 13
- Comments: 22 (5 by maintainers)
Hi! I faced with Memory Leak after upgrade ClickHouse to 23.8 version too.
Current version: 23.8.9.54 Shards: 1 Replicas: 2 Using kafka engine: No
I have 2 replicas (node1 and node2) and there is no client queries to node2 at all. But after the upgrading to 23.8 version, the both nodes started to leak memory similarly. Memory leak is going slow, but inevitable: after 30 days the memory on a node is over.
This is memory consumption on the node2 (with no client queries):
Please, tell me what additional information may I provide you about the issue 🙏
Thanks, @fishday53! Looks like it could be related to what we have observed. Let’s see if we can find the culprit.
@jrdi, Hi! Previous version was: 22.2.2.1
Here results for create and drop tables: