ClickHouse: Queries on distributed web tables using parallel replicas do duplicate work rather than distributing

Describe what’s wrong

When performing queries on parallel replicas of a web table, work is duplicated rather than distributed which causes redundant processing and slows down final query results.

Does it reproduce on recent release?

Yes

Version

Version 23.6.2

How to reproduce

ATTACH TABLE github_events UUID '127f4241-4a9b-4ecd-8a84-846b88069cb5' on cluster '{cluster}'
(
    `file_time` DateTime,
    `event_type` Enum8('CommitCommentEvent' = 1, 'CreateEvent' = 2, 'DeleteEvent' = 3, 'ForkEvent' = 4, 'GollumEvent' = 5, 'IssueCommentEvent' = 6, 'IssuesEvent' = 7, 'MemberEvent' = 8, 'PublicEvent' = 9, 'PullRequestEvent' = 10, 'PullRequestReviewCommentEvent' = 11, 'PushEvent' = 12, 'ReleaseEvent' = 13, 'SponsorshipEvent' = 14, 'WatchEvent' = 15, 'GistEvent' = 16, 'FollowEvent' = 17, 'DownloadEvent' = 18, 'PullRequestReviewEvent' = 19, 'ForkApplyEvent' = 20, 'Event' = 21, 'TeamAddEvent' = 22),
    `actor_login` LowCardinality(String),
    `repo_name` LowCardinality(String),
    `created_at` DateTime,
    `updated_at` DateTime,
    `action` Enum8('none' = 0, 'created' = 1, 'added' = 2, 'edited' = 3, 'deleted' = 4, 'opened' = 5, 'closed' = 6, 'reopened' = 7, 'assigned' = 8, 'unassigned' = 9, 'labeled' = 10, 'unlabeled' = 11, 'review_requested' = 12, 'review_request_removed' = 13, 'synchronize' = 14, 'started' = 15, 'published' = 16, 'update' = 17, 'create' = 18, 'fork' = 19, 'merged' = 20),
    `comment_id` UInt64,
    `body` String,
    `path` String,
    `position` Int32,
    `line` Int32,
    `ref` LowCardinality(String),
    `ref_type` Enum8('none' = 0, 'branch' = 1, 'tag' = 2, 'repository' = 3, 'unknown' = 4),
    `creator_user_login` LowCardinality(String),
    `number` UInt32,
    `title` String,
    `labels` Array(LowCardinality(String)),
    `state` Enum8('none' = 0, 'open' = 1, 'closed' = 2),
    `locked` UInt8,
    `assignee` LowCardinality(String),
    `assignees` Array(LowCardinality(String)),
    `comments` UInt32,
    `author_association` Enum8('NONE' = 0, 'CONTRIBUTOR' = 1, 'OWNER' = 2, 'COLLABORATOR' = 3, 'MEMBER' = 4, 'MANNEQUIN' = 5),
    `closed_at` DateTime,
    `merged_at` DateTime,
    `merge_commit_sha` String,
    `requested_reviewers` Array(LowCardinality(String)),
    `requested_teams` Array(LowCardinality(String)),
    `head_ref` LowCardinality(String),
    `head_sha` String,
    `base_ref` LowCardinality(String),
    `base_sha` String,
    `merged` UInt8,
    `mergeable` UInt8,
    `rebaseable` UInt8,
    `mergeable_state` Enum8('unknown' = 0, 'dirty' = 1, 'clean' = 2, 'unstable' = 3, 'draft' = 4),
    `merged_by` LowCardinality(String),
    `review_comments` UInt32,
    `maintainer_can_modify` UInt8,
    `commits` UInt32,
    `additions` UInt32,
    `deletions` UInt32,
    `changed_files` UInt32,
    `diff_hunk` String,
    `original_position` UInt32,
    `commit_id` String,
    `original_commit_id` String,
    `push_size` UInt32,
    `push_distinct_size` UInt32,
    `member_login` LowCardinality(String),
    `release_tag_name` String,
    `release_name` String,
    `review_state` Enum8('none' = 0, 'approved' = 1, 'changes_requested' = 2, 'commented' = 3, 'dismissed' = 4, 'pending' = 5)
)
ENGINE = MergeTree
ORDER BY (event_type, repo_name, created_at)
SETTINGS disk = disk(type = web, endpoint = 'http://clickhouse-public-datasets.s3.amazonaws.com/web/')

create table github_events_all on cluster '{cluster}' as github_events ENGINE = Distributed('{cluster}', default, github_events)

Single node:

SELECT uniq(repo_name)
FROM github_events
SETTINGS max_parallel_replicas = 1, allow_experimental_parallel_reading_from_replicas = 0, prefer_localhost_replica = 1

Query id: bdd4b17d-57b2-456e-ade5-3447e912c992

┌─uniq(repo_name)─┐
│       262270450 │
└─────────────────┘

1 row in set. Elapsed: 46.513 sec. Processed 5.06 billion rows, 10.09 GB (108.77 million rows/s., 216.98 MB/s.)

Distributed table, 6 replicas, 1 shard:

SELECT uniq(repo_name)
FROM github_events_all
SETTINGS max_parallel_replicas = 6, allow_experimental_parallel_reading_from_replicas = 1, prefer_localhost_replica = 0

Query id: 855452b0-ed4a-4742-a288-dcf6dcb4e9b6

┌─uniq(repo_name)─┐
│       262270450 │
└─────────────────┘

1 row in set. Elapsed: 71.592 sec. Processed 30.36 billion rows, 60.55 GB (424.02 million rows/s., 845.84 MB/s.)

You can see it processes 6x as many rows and data, and takes substantially longer.

Expected behavior

Should divide the work among replicas, and be faster

Additional context

I had to use a distributed table, was not using parallel replicas without the distributed table. I also had to provide those settings on every query as SET would not cause it to use all nodes.

About this issue

Original URL
State: closed
Created a year ago
Comments: 24 (20 by maintainers)

Most upvoted comments

Also keep in mind, that you have to create a VPC endpoint for S3 in AWS, otherwise, the total network throughput will be limited to 100 Gbit/sec.

alexey-milovidov on Jul 26, 2023