presto: Presto with clustered Hive tables fails if all buckets not created

We’re running Hive and Presto on AWS EMR 5.11.1. This is Hive 2.3.2 and Presto 0.187.

We have a number of tables with the following sorts of DDL:

CREATE TABLE IF NOT EXISTS measurements (
    id STRING,
    session STRING,
    tt_ts INT,
    r_crs_id INT,
    user_agent STRING
)
PARTITIONED BY(daydate STRING, epoch BIGINT)
CLUSTERED BY(r_crs_id) INTO 64 BUCKETS
STORED AS ORC
LOCATION's3://warehouse/'
tblproperties ("orc.compress"="ZLIB");

Our data is always inserted an hour at a time (the ‘epoch’ partition) and always with INSERT OVERWRITE in Hive.

When we insert with the ‘mr’ engine, Hive will create all 64 buckets, whether there is data in them or not. But Hive now defaults to the Tez engine, and mr is deprecated. The Tez engine will only create buckets when there is data to be inserted in a bucket. If all the buckets are not present then when querying with Presto an error such as the following will be returned:

Query XXXX failed: Hive table is corrupt. It is declared as being bucketed, but the files do not match the bucketing declaration. The number of files in the directory (14) does not match the declared bucket count (64) for partition: XXXXX

The error is easy to understand; Presto fails if not all buckets are created. This case is currently showing up with sparse test data. With production data we expect that in most cases the buckets will be filled and created. However we can’t chance that a timestamp with sparse or missing data can cause all queries across a range to fail.

Therefore we are looking for clarification so we can ensure that when we move to production we understand whether we can use clustering and under what circumstances.

It appears that in Presto release 0.55 this issue was addressed. See https://prestodb.io/docs/current/release/release-0.55.html under “Hive Bucketed Table” fixes.

Is this a regression?

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 40 (15 by maintainers)

Most upvoted comments

It was added in 0.202. In 0.208, we removed the size limitation. However, that change introduced a data loss bug, so wait for 0.209 (should be in the next few days).

I tend to agree with you guys. I was actually thinking of 2 workarounds, but I agree that filing an issue with Tez that has an option to create all the bucket files is probably best.

That said, I still think that Presto should have an option that would warn but scan all files in the partition. Someone pointed out that there was a similar “fix” 4 years ago.

I’ll file the Tez bug.

Can you file an issue for Tez? Not creating all the bucket files seems to be a bug with no good workaround. Given how bucketing works, I don’t understand how Tez’s behavior is useful.

This is how Hive bucketing works: Rather than naming each bucket file with a specific name, such as bucket5, the file names are sorted and a bucket is simply the Nth file in the sorted list. Thus, if files are missing, you have no way of knowing which bucket number corresponds to a given file.