ClickHouse: Broken compatibility between 22.8.5.29 and 22.8.6.71

Steps to reproduce:

  • Setup a DB with 22.8.5.29.
  • Create an argMax state with a string of 32 chars. (any size seems to be affected)
  • Update to 22.8.6.71.
  • Select argMaxMerge of that status.
  • An extra NULL will appear in the result of the HTTP/JSON output.

Initial setup

$ clickhouse client
ClickHouse client version 22.11.1.1.
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.8.5 revision 54460.

ClickHouse server version is older than ClickHouse client. It may indicate that the server is out of date and can be upgraded.

Warnings:
 * Linux transparent hugepages are set to "always". Check /sys/kernel/mm/transparent_hugepage/enabled
 * Effective user of the process (raul) does not match the owner of the data (root).

Mordor :) Select version();

SELECT version()

Query id: 57c35415-15b2-4929-b775-ca7e63a40cc2

┌─version()─┐
│ 22.8.5.29 │
└───────────┘

1 row in set. Elapsed: 0.001 sec. 

Mordor :) create table check_state Engine=MergeTree() order by number AS Select number, argMaxState('01234567890123456789012', number) as s from numbers(1000) group by number;

CREATE TABLE check_state
ENGINE = MergeTree
ORDER BY number AS
SELECT
    number,
    argMaxState('01234567890123456789012', number) AS s
FROM numbers(1000)
GROUP BY number

Query id: e9359e80-8a32-42bb-8774-3801604f20db

Ok.

0 rows in set. Elapsed: 0.003 sec. Processed 1.00 thousand rows, 8.00 KB (337.05 thousand rows/s., 2.70 MB/s.)

Query via the HTTP endpoint before restart:

$ echo "Select version(), argMaxMerge(s) from check_state format JSON;" | curl 'http://localhost:8123/' --data-binary @-
{
        "meta":
        [
                {
                        "name": "version()",
                        "type": "String"
                },
                {
                        "name": "argMaxMerge(s)",
                        "type": "String"
                }
        ],

        "data":
        [
                {
                        "version()": "22.8.5.29",
                        "argMaxMerge(s)": "01234567890123456789012"
                }
        ],

        "rows": 1,

        "statistics":
        {
                "elapsed": 0.000524577,
                "rows_read": 1000,
                "bytes_read": 134976
        }
}

Restart with 22.8.6.71

Repeat the query

$ echo "Select version(), argMaxMerge(s) from check_state format JSON;" | curl 'http://localhost:8123/' --data-binary @-
{
        "meta":
        [
                {
                        "name": "version()",
                        "type": "String"
                },
                {
                        "name": "argMaxMerge(s)",
                        "type": "String"
                }
        ],

        "data":
        [
                {
                        "version()": "22.8.6.71",
                        "argMaxMerge(s)": "01234567890123456789012\u0000"
                }
        ],

        "rows": 1,

        "statistics":
        {
                "elapsed": 0.000488996,
                "rows_read": 1000,
                "bytes_read": 134976
        }
}

Notice the extra UTF-8 NULL character at the end of the argMaxMerge(s).

Other tests

  • This does not happen if the table / part is created with 22.8.6.71 directly.
  • If you create the part in 22.8.6.71 and go back to 22.8.5.29 then one character will be missing:
/tmp $ echo "Select version(), argMaxMerge(s) from check_state2 format JSON;" | curl 'http://localhost:8123/' --data-binary @-
{
        "meta":
        [
                {
                        "name": "version()",
                        "type": "String"
                },
                {
                        "name": "argMaxMerge(s)",
                        "type": "String"
                }
        ],

        "data":
        [
                {
                        "version()": "22.8.6.71",
                        "argMaxMerge(s)": "01234567890123456789012" <<<<<<<<<<<<<<<<<<<<<<
                }
        ],

        "rows": 1,

        "statistics":
        {
                "elapsed": 0.000458037,
                "rows_read": 1000,
                "bytes_read": 134976
        }
}
/tmp $ echo "Select version(), argMaxMerge(s) from check_state2 format JSON;" | curl 'http://localhost:8123/' --data-binary @-
{
        "meta":
        [
                {
                        "name": "version()",
                        "type": "String"
                },
                {
                        "name": "argMaxMerge(s)",
                        "type": "String"
                }
        ],

        "data":
        [
                {
                        "version()": "22.8.5.29",
                        "argMaxMerge(s)": "0123456789012345678901" <<<<<<<<<<<<<<<<<<<<<<<<
                }
        ],

        "rows": 1,

        "statistics":
        {
                "elapsed": 0.000484087,
                "rows_read": 1000,
                "bytes_read": 134976
        }
}

Hashes of the part when created with 22.8.5:

893125c14d599310f5e6119383005734  check_state/all_1_1_0/checksums.txt
dd4ed1c4fcbb915e511f0b3f72d1ac3f  check_state/all_1_1_0/columns.txt
a9b7ba70783b617e9998dc4dd82eb3c5  check_state/all_1_1_0/count.txt
29702fd20d1d59113cc6d6c916acefcb  check_state/all_1_1_0/data.bin <<<<<<<<<<<<<<<<<<<<<<<
efd800e25d72f58c5c72147672a74cc4  check_state/all_1_1_0/data.mrk3
c0904274faa8f3f06f35666cc9c5bd2f  check_state/all_1_1_0/default_compression_codec.txt
09d8059f463130063433b13bd61e7b97  check_state/all_1_1_0/primary.idx

Hashes when created with 22.8.8:

f17f579b9ec38bada5e4bb2a5e5a57d4  check_state2/all_1_1_0/checksums.txt
dd4ed1c4fcbb915e511f0b3f72d1ac3f  check_state2/all_1_1_0/columns.txt
a9b7ba70783b617e9998dc4dd82eb3c5  check_state2/all_1_1_0/count.txt
8e24d55e2919b0b9ecbe40101c6c7fbc  check_state2/all_1_1_0/data.bin <<<<<<<<<<<<<<<<<<<<<<<
efd800e25d72f58c5c72147672a74cc4  check_state2/all_1_1_0/data.mrk3
c0904274faa8f3f06f35666cc9c5bd2f  check_state2/all_1_1_0/default_compression_codec.txt
09d8059f463130063433b13bd61e7b97  check_state2/all_1_1_0/primary.idx

Either I’m missing something or this is a major breaking bug in a minor release.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 4
  • Comments: 16 (16 by maintainers)

Most upvoted comments

A short summary of what I’ve seen and what I think it could be done:

I’m going to refer to 22.3 as the old behaviour, 22.9 as the version with the incidental changes to disk storage and 22.next as the proposal. But keep in mind that there are minor versions of 22.3 and 22.8 with the prior behaviour and the buggy one.

  • In 22.3, strings stored in max/min/any/argMax/argMin and family where stored in disk with a NULL character. So the string “0123456789” would be stored as size 11 and “0123456789\0” as the value.
  • In 22.9, this was changed by mistake to be saved as size 10, but the stored value also contained the NULL character: “0123456789\0”.

This produces the following bugs:

  • If 22.9 reads a value stored by 22.3, it sees as string size + 1 (instead of string size). This produces that when calling insertResultInto the string produced is stored as size 11 (read from disk), plus an additional NULL, with the final result of seeing an extra character. AFAICS, this is not a security vulnerability as it never reads random memory.
  • If 22.3 reads a value stored by 22.9, it sees it as string size (instead of string size +1). This produces that when calling insertResultInto the string produced is inserted with one less character than it should. For an empty string this can lead to a crash.

To me the most important problem right now is how to make it possible to move forward in a way that updating is safe and it’s possible to run, for a certain amount of time, CH 22.3 at the alongside 22.next.

In order to do that, the approach I’ve taken in CH22.next:

  • When data is read, if the last counted character is NULL keep it. This makes it compatible with 22.3.
  • When data is read, if the last counted character is NOT NULL, add an extra NULL character. This makes it compatible with 22.9.
  • When data is written, include and count the NULL character.
    • This reverts the behaviour to the same as 22.3, making 22.next store the same bytes as 22.3 so running them together is possible
    • When running 22.9, you might see the extra NULL character both with data inserted with 22.3 and with data inserted in 22.next. Updating to a version with the fix will address this.

Then we’d need to recommend to not run 22.3 alongside 22.9 and update directly to 22.next (there would need to be a 22.8.next too obviously).

@alexey-milovidov Does an approach like this makes sense to you? My plan is to re-test (and automate tests as much as possible), review the solution and propose a patch that can be applied to all affected versions.

argMaxMerge which is a much smaller problem.

Compatibility issue is always a major problem. How do you propose to fix it? If we will add a code in new version to read old format it will still not allow to change the version down without getting some data corrupted. That should never happen between minor releases. And additionally that should not happen via ‘not for changelog’ pull requests.