hudi: Hudi user ARRAY type have a Exception,NOT a record:"int"
When I select hive _rt table have this is error
-
I use flink1.16 create hudi0.13 MOR table and use “hms” syns hive table.
-
select * from hudi_hive_test.t1_20230530_type_mor_sink_rt limit 10;
Describe the problem you faced
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.org.apache.avro.AvroRuntimeException: Not a record: “int”
To Reproduce
Steps to reproduce the behavior:
1.source table DDL
CREATE TABLE hudi_test.datagen_source_20230530 (
user_id BIGINT,
age INT,
sex STRING,
score DOUBLE,
amount DECIMAL(10, 2),
numbers ARRAY<INT>,
person ROW<id INT, name STRING>,
grade MAP<STRING, INT>,
my_date DATE,
my_float FLOAT
) WITH (
'connector' = 'datagen',
'rows-per-second' = '1',
'fields.user_id.kind' = 'random',
'fields.user_id.min' = '1',
'fields.user_id.max' = '1000',
'fields.age.kind' = 'random',
'fields.age.min' = '18',
'fields.age.max' = '60',
'fields.sex.kind' = 'random',
'fields.score.kind' = 'random',
'fields.score.min' = '0.0',
'fields.score.max' = '100.0',
'fields.amount.kind' = 'random',
'fields.amount.min' = '0.0',
'fields.amount.max' = '1000.0',
'fields.numbers.kind' = 'random',
'fields.numbers.element.min' = '1',
'fields.numbers.element.max' = '100',
'fields.person.kind' = 'random',
'fields.person.id.min' = '1',
'fields.person.id.max' = '1000',
'fields.person.name.length' = '5',
'fields.grade.kind' = 'random',
'fields.grade.key.length' = '5',
'fields.grade.value.min' = '1',
'fields.grade.value.max' = '100',
'fields.my_date.kind' = 'random',
'fields.my_float.kind' = 'random',
'fields.my_float.min' = '0.0',
'fields.my_float.max' = '100.0'
);
2.hudi table DDL
create table hudi_test.t1_20230530_type_mor_sink(
user_id BIGINT,
age INT,
sex STRING,
score DOUBLE,
amount DECIMAL(10, 2),
numbers ARRAY<INT>,
person ROW<id INT, name STRING>,
grade MAP<STRING, INT>,
my_date DATE,
my_float FLOAT
)
with(
'connector'='hudi',
'path' = 'hdfs://user/hive/warehouse/hudi_test/t1_20230530_type_mor_sink',
'table.type'='MERGE_ON_READ',
'hoodie.datasource.write.recordkey.field' = 'user_id',
'hoodie.datasource.write.precombine.field' = 'age',
'write.bucket_assign.tasks'='1',
'write.tasks' = '1',
'compaction.tasks' = '1',
'compaction.async.enabled' = 'true',
'compaction.schedule.enabled' = 'true',
'compaction.trigger.strategy' = 'num_commits',
'compaction.delta_commits' = '2',
'read.streaming.enabled' = 'true',
'changelog.enabled' = 'true',
'read.streaming.skip_compaction' = 'true',
'hive_sync.enable'='true',
'hive_sync.mode' = 'hms',
'hive_sync.metastore.uris' = 'thrift://0.0.0.0:0000',
'hive_sync.db'='hudi_hive_test',
'hive_sync.table'='t1_20230530_type_mor_sink',
'hadoop.dfs.namenode.acls.enabled' = 'false'
);
3.select * from hudi_hive_test.t1_20230530_type_mor_sink_rt limit 10; have this error
4. select other fileds no error
select
user_id,
age,
sex,
score,
amount,
person,
grade,
my_date,
my_float
from hudi_hive_test.t1_20230530_type_mor_sink_rt limit 10;
Overall process:
Expected behavior
I think this is a bug,Bug I am not sure.
Environment Description
-
Hudi version : 0.13
-
Spark version :
-
Hive version :3.1.2
-
Hadoop version : 3.3.1
-
Storage (HDFS/S3/GCS…) : HDFS
-
Running on Docker? (yes/no) : no
Additional context
I have use ARRAY<INT> or ARRAY<LONG> always have this error
Stacktrace
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.org.apache.avro.AvroRuntimeException: Not a record: "int"
If you know something about this, can you help me. ths
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 21 (8 by maintainers)
@danny0405 @xicm I refer #7173 and #8867 after modyfiy code.Now hudi table
timestamp (3)sync hivertand ‘ro’ table not any morelongtype, now , it istimestamptype. When I select_rtor_rotable .it is ok. Thanks so much.I think it is a historical reason because engines like Spark has default precision of 6 for timestamp type, Hive also assumes the INT64 stored in Parquet has timetamp unit as
micros: https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-TimestampstimestampTimestamps, but I agree we should sync the timestamp type to Hive to get rid of these confusions, people ask me aboutwhy timestamp(3) sync to hive as longfor 3 years, let’s fix it in release 0.14.0.@xicm Can you fire a fix for that?
0.13.1 is reased already, you can cherry pick the fix separately.
Now hive supports timestamp, but there is another issue, https://github.com/apache/hudi/blob/9c7d856656f3f3a01c073a2aed444d90c740c913/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L235-L244
I don’t know why we don’t convert
TIMESTAMP_MILLIStoTIMESTAMP.@danny0405 do you know the reason?
Yes, you should cherry pick #7173 if you use Hive3.
@danny0405 Thank you for helping me. Now, When I query the
array<INT>type for Flink.It will have this error in hive client