ClickHouse: [Apache Iceberg] Failing to create table
Creating a table failed when using apache iceberg:
CREATE TABLE iceberg Engine=Iceberg(...)
2023.05.11 10:21:36.140819 [ 19080 ] {bc194292-7197-4ab9-9a30-b9461ab43ecd} <Error> TCPHandler: Code: 499. DB::Exception: Failed to get object info: No response body.. HTTP response code: 404. (S3_ERROR), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0xe3b83d5 in /usr/bin/clickhouse
1. ? @ 0x9801d4d in /usr/bin/clickhouse
2. DB::S3::getObjectInfo(DB::S3::Client const&, String const&, String const&, String const&, DB::S3Settings::RequestSettings const&, bool, bool, bool) @ 0x126aa4fe in /usr/bin/clickhouse
3. DB::StorageS3Source::KeysIterator::KeysIterator(DB::S3::Client const&, String const&, std::vector<String, std::allocator<String>> const&, String const&, DB::S3Settings::RequestSettings const&, std::shared_ptr<DB::IAST>, DB::Block const&, std::shared_ptr<DB::Context const>, std::vector<DB::StorageS3Source::KeyWithInfo, std::allocator<DB::StorageS3Source::KeyWithInfo>>*) @ 0x1447e152 in /usr/bin/clickhouse
4. DB::StorageS3::createFileIterator(DB::StorageS3::Configuration const&, bool, std::shared_ptr<DB::Context const>, std::shared_ptr<DB::IAST>, DB::Block const&, std::vector<DB::StorageS3Source::KeyWithInfo, std::allocator<DB::StorageS3Source::KeyWithInfo>>*) @ 0x14486c4b in /usr/bin/clickhouse
5. DB::StorageS3::getTableStructureFromDataImpl(DB::StorageS3::Configuration const&, std::optional<DB::FormatSettings> const&, std::shared_ptr<DB::Context const>) @ 0x144861c6 in /usr/bin/clickhouse
We have looked to the codebase and I have the feeling that you’re calling the head object on a folder to get the last modification time but a folder on S3 has never a last modification time. We have run the following command on the same buckert and keys set on the IcebergEngine and we see the same exact error (aws s3api head-object bucket --key mykey returns a 404).
Version used: 23.4.1.1943
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 20 (12 by maintainers)
Sure, schema inference works only if user didn’t specify structure manually. You always can specify structure in the create statement as usual:
As well as for table function (same as for
s3
table function):It’s very good idea.
It just wasn’t implemented. Current schema inference was just derived from S3 table engine.
Actually, @kssenii it’s your code 😃 https://github.com/ClickHouse/ClickHouse/pull/43454
It was needed for calculating the total_size for progress bar. We use the same iterator for reading and for schema inference. For reading it’s ok, we will do all these head requests anyway, but for schema inference we should not do it, we read only some first files, and we don’t need to calculate total_size because we don’t send progress on schema inference. We can add a flag for KeysIterator for schema inference to do the head request only when we requested new key. I will create a PR for it
UPD: https://github.com/ClickHouse/ClickHouse/pull/50203
Hey @kssenii
Let me give you a simple and easy to reproduce.
How did I’ve generated the data ?
Run spark-sql with
As you can see I’m able to read the data with spark-sql .
What do we have in AWS?
Data
Metada
ClickHouse
I’m using the docker image clickhouse-server:23.4-alpine
The issue happens when executing this simple request:
In the trace you see two data file mentionned:
and if we can see that it consistent with
Let’s check the head-object request for both files:
If we tried to read the data without Iceberg:
I’ve put as much details possible to help to know if the issue is in the way we have generated the data or there is a real bug in the 23.4 release