hudi: Hudi user ARRAY type have a Exception,NOT a record:"int"

When I select hive _rt table have this is error

  • I use flink1.16 create hudi0.13 MOR table and use “hms” syns hive table.

  • select * from hudi_hive_test.t1_20230530_type_mor_sink_rt limit 10;

Describe the problem you faced

org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.org.apache.avro.AvroRuntimeException: Not a record: “int”

To Reproduce

Steps to reproduce the behavior:

1.source table DDL

CREATE TABLE hudi_test.datagen_source_20230530 (
                                                     user_id BIGINT,
                                                     age INT,
                                                     sex STRING,
                                                     score DOUBLE,
                                                     amount DECIMAL(10, 2),
                                                     numbers ARRAY<INT>,
                                                     person ROW<id INT, name STRING>,
                                                     grade MAP<STRING, INT>,
                                                     my_date DATE,
                                                     my_float FLOAT
) WITH (
      'connector' = 'datagen',
      'rows-per-second' = '1',
      'fields.user_id.kind' = 'random',
      'fields.user_id.min' = '1',
      'fields.user_id.max' = '1000',
      'fields.age.kind' = 'random',
      'fields.age.min' = '18',
      'fields.age.max' = '60',
      'fields.sex.kind' = 'random',
      'fields.score.kind' = 'random',
      'fields.score.min' = '0.0',
      'fields.score.max' = '100.0',
      'fields.amount.kind' = 'random',
      'fields.amount.min' = '0.0',
      'fields.amount.max' = '1000.0',
      'fields.numbers.kind' = 'random',
      'fields.numbers.element.min' = '1',
      'fields.numbers.element.max' = '100',
      'fields.person.kind' = 'random',
      'fields.person.id.min' = '1',
      'fields.person.id.max' = '1000',
      'fields.person.name.length' = '5',
      'fields.grade.kind' = 'random',
      'fields.grade.key.length' = '5',
      'fields.grade.value.min' = '1',
      'fields.grade.value.max' = '100',
      'fields.my_date.kind' = 'random',
      'fields.my_float.kind' = 'random',
      'fields.my_float.min' = '0.0',
      'fields.my_float.max' = '100.0'
      );

2.hudi table DDL

create table hudi_test.t1_20230530_type_mor_sink(
                                                      user_id BIGINT,
                                                      age INT,
                                                      sex STRING,
                                                      score DOUBLE,
                                                      amount DECIMAL(10, 2),
                                                      numbers ARRAY<INT>,
                                                      person ROW<id INT, name STRING>,
                                                      grade MAP<STRING, INT>,
                                                      my_date DATE,
                                                      my_float FLOAT
)
    with(
        'connector'='hudi',
        'path' = 'hdfs://user/hive/warehouse/hudi_test/t1_20230530_type_mor_sink',
        'table.type'='MERGE_ON_READ',
        'hoodie.datasource.write.recordkey.field' = 'user_id',
        'hoodie.datasource.write.precombine.field' = 'age',
        'write.bucket_assign.tasks'='1',
        'write.tasks' = '1',
        'compaction.tasks' = '1',
        'compaction.async.enabled' = 'true',
        'compaction.schedule.enabled' = 'true',
        'compaction.trigger.strategy' = 'num_commits',
        'compaction.delta_commits' = '2',
        'read.streaming.enabled' = 'true',
        'changelog.enabled' = 'true',
        'read.streaming.skip_compaction' = 'true',
        'hive_sync.enable'='true',
        'hive_sync.mode' = 'hms',
        'hive_sync.metastore.uris' = 'thrift://0.0.0.0:0000',
        'hive_sync.db'='hudi_hive_test',
        'hive_sync.table'='t1_20230530_type_mor_sink',
        'hadoop.dfs.namenode.acls.enabled' = 'false'
        );

image 3.select * from hudi_hive_test.t1_20230530_type_mor_sink_rt limit 10; have this error image 4. select other fileds no error

select
    user_id,
    age,
    sex,
    score,
    amount,
    person,
    grade,
    my_date,
    my_float
from hudi_hive_test.t1_20230530_type_mor_sink_rt limit 10;

image

Overall process: image

Expected behavior

I think this is a bug,Bug I am not sure.

Environment Description

  • Hudi version : 0.13

  • Spark version :

  • Hive version :3.1.2

  • Hadoop version : 3.3.1

  • Storage (HDFS/S3/GCS…) : HDFS

  • Running on Docker? (yes/no) : no

Additional context

I have use ARRAY<INT> or ARRAY<LONG> always have this error

Stacktrace

org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.org.apache.avro.AvroRuntimeException: Not a record: "int"

If you know something about this, can you help me. ths

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 21 (8 by maintainers)

Most upvoted comments

@danny0405 @xicm I refer #7173 and #8867 after modyfiy code.Now hudi table timestamp (3) sync hive rt and ‘ro’ table not any more long type, now , it is timestamp type. When I select _rt or _ro table .it is ok. Thanks so much.

I don’t know why we don’t convert TIMESTAMP_MILLIS to TIMESTAMP.

I think it is a historical reason because engines like Spark has default precision of 6 for timestamp type, Hive also assumes the INT64 stored in Parquet has timetamp unit as micros: https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-TimestampstimestampTimestamps, but I agree we should sync the timestamp type to Hive to get rid of these confusions, people ask me about why timestamp(3) sync to hive as long for 3 years, let’s fix it in release 0.14.0.

@xicm Can you fire a fix for that?

0.13.1 is reased already, you can cherry pick the fix separately.

Now hive supports timestamp, but there is another issue, https://github.com/apache/hudi/blob/9c7d856656f3f3a01c073a2aed444d90c740c913/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L235-L244

I don’t know why we don’t convert TIMESTAMP_MILLIS to TIMESTAMP.

@danny0405 do you know the reason?

Yes, you should cherry pick #7173 if you use Hive3.

@danny0405 Thank you for helping me. Now, When I query the array<INT> type for Flink.It will have this error in hive client

org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.org.apache.avro.AvroRuntimeException: Not a record: "int"