ClickHouse: Distributed queries stopped working in a cluster using TLS connections.
When we upgraded our Clickhouse cluster from version 19.16.2.2 to version 19.17.6.36, distributed SELECT queries stopped working and we saw the following error message in the log:
2020.01.02 14:49:53.874989 [ 99 ] {...} <Error> executeQuery: Code: 32, e.displayText() = DB::Exception: Attempt to read after eof (version 19.17.6.36 (official build)) (from [::ffff:192.168.0.14]:56306) (in query: SELECT offset, update FROM default.test WHERE (topic = 'test1') AND (partition = 1) ORDER BY offset DESC LIMIT 1), Stack trace:
0. 0x55e0ed21f0e0 StackTrace::StackTrace() /usr/bin/clickhouse
1. 0x55e0ed21eeb5 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
2. 0x55e0ecd5fec9 DB::throwReadAfterEOF() /usr/bin/clickhouse
3. 0x55e0ed2aa1cb DB::readVarUInt(unsigned long&, DB::ReadBuffer&) /usr/bin/clickhouse
4. 0x55e0ed2a323e DB::TCPHandler::isQueryCancelled() /usr/bin/clickhouse
5. 0x55e0ed2a5bfb DB::TCPHandler::processOrdinaryQuery() /usr/bin/clickhouse
6. 0x55e0ed2a8f56 DB::TCPHandler::runImpl() /usr/bin/clickhouse
7. 0x55e0ed2a914b DB::TCPHandler::run() /usr/bin/clickhouse
8. 0x55e0f11c2fe0 Poco::Net::TCPServerConnection::start() /usr/bin/clickhouse
9. 0x55e0f11c36fd Poco::Net::TCPServerDispatcher::run() /usr/bin/clickhouse
10. 0x55e0f2899871 Poco::PooledThread::run() /usr/bin/clickhouse
11. 0x55e0f289761c Poco::ThreadImpl::runnableEntry(void*) /usr/bin/clickhouse
12. 0x55e0f300d780 ? /usr/bin/clickhouse
13. 0x7f14ef11efa3 start_thread /lib/x86_64-linux-gnu/libpthread-2.28.so
14. 0x7f14ef0404cf clone /lib/x86_64-linux-gnu/libc-2.28.so
2020.01.02 14:50:06.953139 [ 99 ] {...} <Error> executeQuery: Code: 32, e.displayText() = DB::Exception: Attempt to read after eof (version 19.17.6.36 (official build)) (from [::ffff:192.168.0.14]:56316) (in query: SELECT offset, update FROM default.test WHERE (topic = 'test1') AND (partition = 1) ORDER BY offset DESC LIMIT 1), Stack trace:
0. 0x55e0ed21f0e0 StackTrace::StackTrace() /usr/bin/clickhouse
1. 0x55e0ed21eeb5 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
2. 0x55e0ecd5fec9 DB::throwReadAfterEOF() /usr/bin/clickhouse
3. 0x55e0ed2aa1cb DB::readVarUInt(unsigned long&, DB::ReadBuffer&) /usr/bin/clickhouse
4. 0x55e0ed2a323e DB::TCPHandler::isQueryCancelled() /usr/bin/clickhouse
5. 0x55e0ed2a5bfb DB::TCPHandler::processOrdinaryQuery() /usr/bin/clickhouse
6. 0x55e0ed2a8f56 DB::TCPHandler::runImpl() /usr/bin/clickhouse
7. 0x55e0ed2a914b DB::TCPHandler::run() /usr/bin/clickhouse
8. 0x55e0f11c2fe0 Poco::Net::TCPServerConnection::start() /usr/bin/clickhouse
9. 0x55e0f11c36fd Poco::Net::TCPServerDispatcher::run() /usr/bin/clickhouse
10. 0x55e0f2899871 Poco::PooledThread::run() /usr/bin/clickhouse
11. 0x55e0f289761c Poco::ThreadImpl::runnableEntry(void*) /usr/bin/clickhouse
12. 0x55e0f300d780 ? /usr/bin/clickhouse
13. 0x7f14ef11efa3 start_thread /lib/x86_64-linux-gnu/libpthread-2.28.so
14. 0x7f14ef0404cf clone /lib/x86_64-linux-gnu/libc-2.28.so
Switching back to version 19.17.5.18 solved the issue.
Example cluster config:
<cluster1>
<shard>
<replica>
<host>ch01</host>
<port>9440</port>
<secure>1</secure>
</replica>
</shard>
<shard>
<replica>
<host>ch02</host>
<port>9440</port>
<secure>1</secure>
</replica>
</shard>
...
</cluster1>
Unfortunately we could not replicate this issues in our test environment.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 29 (15 by maintainers)
Feedback: latest stable version 20.1.6.30 runs fine with intra-cluster TLS connections (secure=1) and client -> clickhouse server TLS connections.