hudi: Hudi user ARRAY type have a Exception,NOT a record:"int"

When I select hive _rt table have this is error

I use flink1.16 create hudi0.13 MOR table and use “hms” syns hive table.
select * from hudi_hive_test.t1_20230530_type_mor_sink_rt limit 10;

Describe the problem you faced

org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.org.apache.avro.AvroRuntimeException: Not a record: “int”

To Reproduce

Steps to reproduce the behavior:

1.source table DDL

CREATE TABLE hudi_test.datagen_source_20230530 (
                                                     user_id BIGINT,
                                                     age INT,
                                                     sex STRING,
                                                     score DOUBLE,
                                                     amount DECIMAL(10, 2),
                                                     numbers ARRAY<INT>,
                                                     person ROW<id INT, name STRING>,
                                                     grade MAP<STRING, INT>,
                                                     my_date DATE,
                                                     my_float FLOAT
) WITH (
      'connector' = 'datagen',
      'rows-per-second' = '1',
      'fields.user_id.kind' = 'random',
      'fields.user_id.min' = '1',
      'fields.user_id.max' = '1000',
      'fields.age.kind' = 'random',
      'fields.age.min' = '18',
      'fields.age.max' = '60',
      'fields.sex.kind' = 'random',
      'fields.score.kind' = 'random',
      'fields.score.min' = '0.0',
      'fields.score.max' = '100.0',
      'fields.amount.kind' = 'random',
      'fields.amount.min' = '0.0',
      'fields.amount.max' = '1000.0',
      'fields.numbers.kind' = 'random',
      'fields.numbers.element.min' = '1',
      'fields.numbers.element.max' = '100',
      'fields.person.kind' = 'random',
      'fields.person.id.min' = '1',
      'fields.person.id.max' = '1000',
      'fields.person.name.length' = '5',
      'fields.grade.kind' = 'random',
      'fields.grade.key.length' = '5',
      'fields.grade.value.min' = '1',
      'fields.grade.value.max' = '100',
      'fields.my_date.kind' = 'random',
      'fields.my_float.kind' = 'random',
      'fields.my_float.min' = '0.0',
      'fields.my_float.max' = '100.0'
      );

2.hudi table DDL

create table hudi_test.t1_20230530_type_mor_sink(
                                                      user_id BIGINT,
                                                      age INT,
                                                      sex STRING,
                                                      score DOUBLE,
                                                      amount DECIMAL(10, 2),
                                                      numbers ARRAY<INT>,
                                                      person ROW<id INT, name STRING>,
                                                      grade MAP<STRING, INT>,
                                                      my_date DATE,
                                                      my_float FLOAT
)
    with(
        'connector'='hudi',
        'path' = 'hdfs://user/hive/warehouse/hudi_test/t1_20230530_type_mor_sink',
        'table.type'='MERGE_ON_READ',
        'hoodie.datasource.write.recordkey.field' = 'user_id',
        'hoodie.datasource.write.precombine.field' = 'age',
        'write.bucket_assign.tasks'='1',
        'write.tasks' = '1',
        'compaction.tasks' = '1',
        'compaction.async.enabled' = 'true',
        'compaction.schedule.enabled' = 'true',
        'compaction.trigger.strategy' = 'num_commits',
        'compaction.delta_commits' = '2',
        'read.streaming.enabled' = 'true',
        'changelog.enabled' = 'true',
        'read.streaming.skip_compaction' = 'true',
        'hive_sync.enable'='true',
        'hive_sync.mode' = 'hms',
        'hive_sync.metastore.uris' = 'thrift://0.0.0.0:0000',
        'hive_sync.db'='hudi_hive_test',
        'hive_sync.table'='t1_20230530_type_mor_sink',
        'hadoop.dfs.namenode.acls.enabled' = 'false'
        );

3.select * from hudi_hive_test.t1_20230530_type_mor_sink_rt limit 10; have this error 4. select other fileds no error

select
    user_id,
    age,
    sex,
    score,
    amount,
    person,
    grade,
    my_date,
    my_float
from hudi_hive_test.t1_20230530_type_mor_sink_rt limit 10;

Overall process:

Expected behavior

I think this is a bug,Bug I am not sure.

Environment Description

Hudi version : 0.13
Spark version :
Hive version :3.1.2
Hadoop version : 3.3.1
Storage (HDFS/S3/GCS…) : HDFS
Running on Docker? (yes/no) : no

Additional context

I have use ARRAY<INT> or ARRAY<LONG> always have this error

Stacktrace

org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.org.apache.avro.AvroRuntimeException: Not a record: "int"

If you know something about this, can you help me. ths

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 21 (8 by maintainers)

Most upvoted comments

@danny0405 @xicm I refer #7173 and #8867 after modyfiy code.Now hudi table timestamp (3) sync hive rt and ‘ro’ table not any more long type, now , it is timestamp type. When I select _rt or _ro table .it is ok. Thanks so much.

Anandonzy on Jun 2, 2023

I don’t know why we don’t convert TIMESTAMP_MILLIS to TIMESTAMP.

I think it is a historical reason because engines like Spark has default precision of 6 for timestamp type, Hive also assumes the INT64 stored in Parquet has timetamp unit as micros: https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-TimestampstimestampTimestamps, but I agree we should sync the timestamp type to Hive to get rid of these confusions, people ask me about why timestamp(3) sync to hive as long for 3 years, let’s fix it in release 0.14.0.

@xicm Can you fire a fix for that?

danny0405 on Jun 1, 2023

0.13.1 is reased already, you can cherry pick the fix separately.

danny0405 on May 31, 2023

Now hive supports timestamp, but there is another issue, https://github.com/apache/hudi/blob/9c7d856656f3f3a01c073a2aed444d90c740c913/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L235-L244

I don’t know why we don’t convert TIMESTAMP_MILLIS to TIMESTAMP.

@danny0405 do you know the reason?

xicm on May 31, 2023

Yes, you should cherry pick #7173 if you use Hive3.

danny0405 on May 31, 2023

@danny0405 Thank you for helping me. Now, When I query the array<INT> type for Flink.It will have this error in hive client

org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.org.apache.avro.AvroRuntimeException: Not a record: "int"

Anandonzy on May 31, 2023