ClickHouse: Upgrade clickhouse from version 22.5.4 to version 23.3.8.21 causes a major increase on memory used by queries

(you don’t have to strictly follow this form)

Describe the unexpected behaviour Upgrade clickhouse from version 22.5.4 to version 23.3.8.21 causes a major increase on memory used by queries

How to reproduce

  • Which ClickHouse server version to use: 22.5.4
  • Queries to run that lead to unexpected result: Here are examples of 2 queries checked: select dom_element, host_selector_path, sum(potential_revenue_sum) as REVENUE_OPPORTUNITY, uniqCombinedMerge(session_uuid_uniq_approx) as SESSIONSCOUNT, sum(click_first_click_count_click_sum) as FIRSTCLICKS, sum(revenue_sum) as REVENUE_TOTAL, sumIf(coalesce(focus_duration_count_not_null,0),( focus_duration_count_not_null is not null and ( client_action is null or client_action <> 'struggle' ))) as BCFOCUSTIME_COUNT, sumIf(coalesce(focus_duration_sum,0),( focus_duration_sum is not null and ( client_action is null or client_action <> 'struggle' ))) as BCFOCUSTIME_SUM, uniqCombinedIfMerge(click_session_uuid_uniq_approx) as CLICKSESSIONS, sum(form_validation_error_count) as VALIDATIONERRORS, sum(click_target_clickable_click_sum) as VALUABLECLICKS, sumMapMerge(struggle_type_map) as STRUGGLESMAP, sum(click_count) as CLICKCOUNT, sum(struggle_count) as STRUGGLECOUNT, uniqCombinedIfMerge(click_page_uuid_uniq_approx) as CLICKPAGES, max(action_visual_name) as BCVISUALNAME_agg_any, groupUniqArray(10)(action_visual_name) as BCVISUALNAME_SAMPLE from interaction_map_target_not_enriched_HOUR_dist where (appidin (1) andstart_tsbetween 1690243200 and 1690848000 and dom_element is not null and dom_element <> '' and (client_action is null or client_action <> 'ajaxRequest') anduadevice_type in ('COMPUTER') and uri_hash = 9028172079634422940 and tenant_id = '79fc5d63-00fc-4e82-9afa-937e53ef6610' and google_segment = 'None' and adobe_segment = 'None' and segment <= 0 and empty(ab_testing)) group by dom_element, host_selector_path having (REVENUE_OPPORTUNITY > 0 or SESSIONSCOUNT > 0 or FIRSTCLICKS > 0 or REVENUE_TOTAL > 0 or BCFOCUSTIME_COUNT > 0 or BCFOCUSTIME_SUM > 0 or CLICKSESSIONS > 0 or VALIDATIONERRORS > 0 or VALUABLECLICKS > 0 or CLICKCOUNT > 0 or STRUGGLECOUNT > 0 or CLICKPAGES > 0) order by max(is_input_element) desc limit 400 settings cust_tenant_id='79fc5d63-00fc-4e82-9afa-937e53ef6610' FORMAT TabSeparatedWithNamesAndTypes

insert into session_enriched (session_uuid, session_ts, tenant_id,client_ip,analyzer_id,user_id,struggle_score_rank,effective_connection_type,os,isp,exit_page_segment,landing_page,host,city,landing_page_segment,device_id,subscription,referrer_name,uadevice_type,exit_page,landing_page_host,country,uri,exit_page_host,os_group,uabrowser_group,uabrowser,appid,vendor,experience_score,perf_ttfb_sum,ajax_duration_sum,perf_page_load_time_sum,eug_id,revenue_sum,session_duration_agg,document_load_event_count,crash_event_count,struggle_count,ajax_event_count,error_count,potential_revenue,interactive_event_count,cs_score,technical_score,custom_score,user_event_view_load_count,click_event_count,change_event_count,custom_event_count,perf_event_count,session_duration,is_replayable,perf_render_time_sum,api_event_count,struggle_score,event_count,conversion_count,screen_event_count,mobile_count,user_event_count,exit_page_ids,hosts,user_ids,adobe_segments,ab_testing,appids,segments,struggle_types,landing_page_ids,google_segments,struggles.key,struggles.value,dyn_string.key,dyn_string.value,cust_dimension.key,cust_dimension.value) select session_uuid, session_ts, tenant_id, max(client_ip) as client_ip,max(analyzer_id) as analyzer_id,max(user_id) as user_id,max(struggle_score_rank) as struggle_score_rank,max(effective_connection_type) as effective_connection_type,max(os) as os,max(isp) as isp,max(exit_page_segment) as exit_page_segment,max(landing_page) as landing_page,max(host) as host,max(city) as city,max(landing_page_segment) as landing_page_segment,max(device_id) as device_id,max(subscription) as subscription,max(referrer_name) as referrer_name,max(uadevice_type) as uadevice_type,max(exit_page) as exit_page,max(landing_page_host) as landing_page_host,max(country) as country,max(uri) as uri,max(exit_page_host) as exit_page_host,max(os_group) as os_group,max(uabrowser_group) as uabrowser_group,max(uabrowser) as uabrowser,max(appid) as appid,max(vendor) as vendor, sum(experience_score) as experience_score,sum(perf_ttfb_sum) as perf_ttfb_sum,sum(ajax_duration_sum) as ajax_duration_sum,sum(perf_page_load_time_sum) as perf_page_load_time_sum,sum(eug_id) as eug_id,sum(revenue_sum) as revenue_sum,sum(session_duration_agg) as session_duration_agg,sum(document_load_event_count) as document_load_event_count,sum(crash_event_count) as crash_event_count,sum(struggle_count) as struggle_count,sum(ajax_event_count) as ajax_event_count,sum(error_count) as error_count,sum(potential_revenue) as potential_revenue,sum(interactive_event_count) as interactive_event_count,sum(cs_score) as cs_score,sum(technical_score) as technical_score,sum(custom_score) as custom_score,sum(user_event_view_load_count) as user_event_view_load_count,sum(click_event_count) as click_event_count,sum(change_event_count) as change_event_count,sum(custom_event_count) as custom_event_count,sum(perf_event_count) as perf_event_count,sum(session_duration) as session_duration,sum(is_replayable) as is_replayable,sum(perf_render_time_sum) as perf_render_time_sum,sum(api_event_count) as api_event_count,sum(struggle_score) as struggle_score,sum(event_count) as event_count,sum(conversion_count) as conversion_count,sum(screen_event_count) as screen_event_count,sum(mobile_count) as mobile_count,sum(user_event_count) as user_event_count, groupUniqArrayArray(exit_page_ids) as exit_page_ids,groupUniqArrayArray(hosts) as hosts,groupUniqArrayArray(user_ids) as user_ids,groupUniqArrayArray(adobe_segments) as adobe_segments,groupUniqArrayArray(ab_testing) as ab_testing,groupUniqArrayArray(appids) as appids,groupUniqArrayArray(segments) as segments,groupUniqArrayArray(struggle_types) as struggle_types,groupUniqArrayArray(landing_page_ids) as landing_page_ids,groupUniqArrayArray(google_segments) as google_segments, groupArrayArray(struggles.key) as struggles.key,groupArrayArray(struggles.value) as struggles.value,groupArrayArray(dyn_string.key) as dyn_string.key,groupArrayArray(dyn_string.value) as dyn_string.value,groupArrayArray(cust_dimension.key) as cust_dimension.key,groupArrayArray(cust_dimension.value) as cust_dimension.value from session where session_ts >= 1693098000 and session_ts < 1693101600 group by session_uuid, session_ts, tenant_id; Expected behavior Tested memory on query log for queries running before upgrade, ran on: 218.39 MiB And then used set send_logs_level=‘trace’ to get memory consumption now on the same queries and got: <Debug> MemoryTracker: Peak memory usage (for query): 2.34 GiB. Downgraded again and got: : <Debug> MemoryTracker: Peak memory usage (for query): 219.56 MiB.

Error message and/or stacktrace If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

### Tasks

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 18 (6 by maintainers)

Most upvoted comments

@alexey-milovidov @nickitat, we managed to reproduce it on 23.3 again, as well as 22.9, we did notice however that from 23.7 onward , the memory issue is resolved

sharing reproduction steps

create clickhouse instance 22.5

docker run -p 8123:8123 -p 9000:9000 -d --name clickhouse_old -v ~/clickhouse_data:/var/lib/clickhouse --ulimit nofile=262144:262144 clickhouse/clickhouse-server:22.5

then go to clickhouse and create the table

CREATE TABLE IF NOT EXISTS reproduce_memory_issue ( subscription LowCardinality(String), user_ts UInt32, inserted_ts UInt32 default now(), user_uuid UUID, appid UInt32, tenant_id LowCardinality(String), usa_bro LowCardinality(Nullable(String)), use_bro_g LowCardinality(Nullable(String)), use_type LowCardinality(Nullable(String)), entry_point Nullable(String), country LowCardinality(Nullable(String)), city LowCardinality(Nullable(String)), isp LowCardinality(Nullable(String)), id LowCardinality(Nullable(String)), user_ip Nullable(String), os LowCardinality(Nullable(String)), os_group LowCardinality(String), dev_type LowCardinality(String), host LowCardinality(Nullable(String)), refe LowCardinality(Nullable(String)), rank LowCardinality(Nullable(String)), con LowCardinality(Nullable(String)), event_count Nullable (UInt32), mob Nullable(UInt8), user_event_count Nullable (UInt32), lan Nullable(String), ex Nullable(String), lan_seg Nullable(String) ) ENGINE = MergeTree PARTITION BY (subscription, toRelativeWeekNum(toDateTime(user_ts))) ORDER BY (tenant_id, appid, user_ts, user_uuid);

then run the attached python script to generate 1M records, we didn’t use GenerateRandom because we have low cardinality fields

py_script.txt

set send_log_level=‘trace’

#run the following query `

select user_uuid, user_ts, tenant_id, max(user_ip) as user_ip,max(id) as id, max(rank) as rank,max(con) as con,max(os) as os,max(isp) as isp, max(lan) as lan,max(host) as host,max(city) as city, max(lan_seg) as lan_seg,max(subscription) as subscription,max(refe) as refe, max(dev_type) as dev_type,max(ex) as ex,max(country) as country,max(entry_point) as entry_point,max(os_group) as os_group,max(use_bro_g) as use_bro_g,max(usa_bro) as usa_bro,sum(mob) as mob,sum(user_event_count) as user_event_count from reproduce_memory_issue group by user_uuid, user_ts, tenant_id

run few times and note the memory

drop the clickhouse container and create clickhouse container with the newer version

docker run -p 8123:8123 -p 9000:9000 -d --name clickhouse_new -v ~/clickhouse_data:/var/lib/clickhouse --ulimit nofile=262144:262144 clickhouse/clickhouse-server:23.3.8.21

run the query with log levels again

see the memory is much higher, for the old clickhouse, we got 4GB , for newer we got 5.3GB , when we have much higher number of records, the gap is even worse

I’m seeing almost identical numbers on both versions.

first thing I’d do is compare the settings used in both cases. i.e. output of

select name, value from system.settings order by name