ClickHouse: Broken compatibility between 22.8.5.29 and 22.8.6.71
Steps to reproduce:
- Setup a DB with 22.8.5.29.
- Create an argMax state with a string
of 32 chars.(any size seems to be affected) - Update to 22.8.6.71.
- Select argMaxMerge of that status.
- An extra NULL will appear in the result of the HTTP/JSON output.
Initial setup
$ clickhouse client
ClickHouse client version 22.11.1.1.
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.8.5 revision 54460.
ClickHouse server version is older than ClickHouse client. It may indicate that the server is out of date and can be upgraded.
Warnings:
* Linux transparent hugepages are set to "always". Check /sys/kernel/mm/transparent_hugepage/enabled
* Effective user of the process (raul) does not match the owner of the data (root).
Mordor :) Select version();
SELECT version()
Query id: 57c35415-15b2-4929-b775-ca7e63a40cc2
┌─version()─┐
│ 22.8.5.29 │
└───────────┘
1 row in set. Elapsed: 0.001 sec.
Mordor :) create table check_state Engine=MergeTree() order by number AS Select number, argMaxState('01234567890123456789012', number) as s from numbers(1000) group by number;
CREATE TABLE check_state
ENGINE = MergeTree
ORDER BY number AS
SELECT
number,
argMaxState('01234567890123456789012', number) AS s
FROM numbers(1000)
GROUP BY number
Query id: e9359e80-8a32-42bb-8774-3801604f20db
Ok.
0 rows in set. Elapsed: 0.003 sec. Processed 1.00 thousand rows, 8.00 KB (337.05 thousand rows/s., 2.70 MB/s.)
Query via the HTTP endpoint before restart:
$ echo "Select version(), argMaxMerge(s) from check_state format JSON;" | curl 'http://localhost:8123/' --data-binary @-
{
"meta":
[
{
"name": "version()",
"type": "String"
},
{
"name": "argMaxMerge(s)",
"type": "String"
}
],
"data":
[
{
"version()": "22.8.5.29",
"argMaxMerge(s)": "01234567890123456789012"
}
],
"rows": 1,
"statistics":
{
"elapsed": 0.000524577,
"rows_read": 1000,
"bytes_read": 134976
}
}
Restart with 22.8.6.71
Repeat the query
$ echo "Select version(), argMaxMerge(s) from check_state format JSON;" | curl 'http://localhost:8123/' --data-binary @-
{
"meta":
[
{
"name": "version()",
"type": "String"
},
{
"name": "argMaxMerge(s)",
"type": "String"
}
],
"data":
[
{
"version()": "22.8.6.71",
"argMaxMerge(s)": "01234567890123456789012\u0000"
}
],
"rows": 1,
"statistics":
{
"elapsed": 0.000488996,
"rows_read": 1000,
"bytes_read": 134976
}
}
Notice the extra UTF-8 NULL character at the end of the argMaxMerge(s)
.
Other tests
- This does not happen if the table / part is created with 22.8.6.71 directly.
- If you create the part in 22.8.6.71 and go back to 22.8.5.29 then one character will be missing:
/tmp $ echo "Select version(), argMaxMerge(s) from check_state2 format JSON;" | curl 'http://localhost:8123/' --data-binary @-
{
"meta":
[
{
"name": "version()",
"type": "String"
},
{
"name": "argMaxMerge(s)",
"type": "String"
}
],
"data":
[
{
"version()": "22.8.6.71",
"argMaxMerge(s)": "01234567890123456789012" <<<<<<<<<<<<<<<<<<<<<<
}
],
"rows": 1,
"statistics":
{
"elapsed": 0.000458037,
"rows_read": 1000,
"bytes_read": 134976
}
}
/tmp $ echo "Select version(), argMaxMerge(s) from check_state2 format JSON;" | curl 'http://localhost:8123/' --data-binary @-
{
"meta":
[
{
"name": "version()",
"type": "String"
},
{
"name": "argMaxMerge(s)",
"type": "String"
}
],
"data":
[
{
"version()": "22.8.5.29",
"argMaxMerge(s)": "0123456789012345678901" <<<<<<<<<<<<<<<<<<<<<<<<
}
],
"rows": 1,
"statistics":
{
"elapsed": 0.000484087,
"rows_read": 1000,
"bytes_read": 134976
}
}
Hashes of the part when created with 22.8.5:
893125c14d599310f5e6119383005734 check_state/all_1_1_0/checksums.txt
dd4ed1c4fcbb915e511f0b3f72d1ac3f check_state/all_1_1_0/columns.txt
a9b7ba70783b617e9998dc4dd82eb3c5 check_state/all_1_1_0/count.txt
29702fd20d1d59113cc6d6c916acefcb check_state/all_1_1_0/data.bin <<<<<<<<<<<<<<<<<<<<<<<
efd800e25d72f58c5c72147672a74cc4 check_state/all_1_1_0/data.mrk3
c0904274faa8f3f06f35666cc9c5bd2f check_state/all_1_1_0/default_compression_codec.txt
09d8059f463130063433b13bd61e7b97 check_state/all_1_1_0/primary.idx
Hashes when created with 22.8.8:
f17f579b9ec38bada5e4bb2a5e5a57d4 check_state2/all_1_1_0/checksums.txt
dd4ed1c4fcbb915e511f0b3f72d1ac3f check_state2/all_1_1_0/columns.txt
a9b7ba70783b617e9998dc4dd82eb3c5 check_state2/all_1_1_0/count.txt
8e24d55e2919b0b9ecbe40101c6c7fbc check_state2/all_1_1_0/data.bin <<<<<<<<<<<<<<<<<<<<<<<
efd800e25d72f58c5c72147672a74cc4 check_state2/all_1_1_0/data.mrk3
c0904274faa8f3f06f35666cc9c5bd2f check_state2/all_1_1_0/default_compression_codec.txt
09d8059f463130063433b13bd61e7b97 check_state2/all_1_1_0/primary.idx
Either I’m missing something or this is a major breaking bug in a minor release.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 16 (16 by maintainers)
A short summary of what I’ve seen and what I think it could be done:
I’m going to refer to 22.3 as the old behaviour, 22.9 as the version with the incidental changes to disk storage and 22.next as the proposal. But keep in mind that there are minor versions of 22.3 and 22.8 with the prior behaviour and the buggy one.
This produces the following bugs:
insertResultInto
the string produced is stored as size 11 (read from disk), plus an additional NULL, with the final result of seeing an extra character. AFAICS, this is not a security vulnerability as it never reads random memory.insertResultInto
the string produced is inserted with one less character than it should. For an empty string this can lead to a crash.To me the most important problem right now is how to make it possible to move forward in a way that updating is safe and it’s possible to run, for a certain amount of time, CH 22.3 at the alongside 22.next.
In order to do that, the approach I’ve taken in CH22.next:
Then we’d need to recommend to not run 22.3 alongside 22.9 and update directly to 22.next (there would need to be a 22.8.next too obviously).
@alexey-milovidov Does an approach like this makes sense to you? My plan is to re-test (and automate tests as much as possible), review the solution and propose a patch that can be applied to all affected versions.
All minor versions with the fix are released: v22.3.15.33-lts v22.8.10.29-lts v22.9.6.20-stable v22.10.4.23-stable v22.11.2.30-stable
Compatibility issue is always a major problem. How do you propose to fix it? If we will add a code in new version to read old format it will still not allow to change the version down without getting some data corrupted. That should never happen between minor releases. And additionally that should not happen via ‘not for changelog’ pull requests.