hudi: [SUPPORT] Flink Table planner not loading problem
Describe the problem you faced Version Information
hudi-0.13.0 flink1.16 flink sql( HUDI CREATE DDL):
CREATE TABLE `ods_action_log_huya_hudi_nopro_test` (
`stime` VARCHAR PRIMARY KEY,
`product` VARCHAR,
`eid` VARCHAR,
`curpage` VARCHAR,
`curlocation` VARCHAR,
`mid` VARCHAR ,
`yyuid` BIGINT,
`prop` VARCHAR,
`dt` VARCHAR,
`hour` VARCHAR
) PARTITIONED BY (`dt`, `hour`)
WITH (
'connector' = 'hudi',
'write.tasks' = '64',
'write.operation' = 'insert', -- The write operation, that this write should do (insert or upsert is supported)
'path' = 'hdfs://huyaopclusternew/user/hive/warehouse/dw_rt_ods.db/ods_action_log_huya_hudi_nopro_test',
'table.type' = 'COPY_ON_WRITE', -- If MERGE_ON_READ, hive query will not have output until the parquet file is generated
'hoodie.bucket.index.num.buckets' = '1',
'hoodie.bucket.index.hash.field' = 'stime',
'hoodie.clean.async' = 'true',
'hoodie.cleaner.commits.retained' = '5',
'hoodie.datasource.write.hive_style_partitioning' = 'true',
'clustering.async.enabled' = 'true'
);
When I turn on ‘clustering.async.enabled’ = ‘true’ Start error:
Caused by: java.lang.ClassNotFoundException: org.apache.flink.table.planner.codegen.sort.SortCodeGenerator
Reasons and suggestions
Flink does not have Flink-table-planner in the jvm operating environment by default after 1.15. The related planner is loaded and used through flink-table-planner-loader through subclasspath. SortCodeGenerator is used in hudi to do part of the logic of the cluster, so This exception occurs.
Specific link: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/configuration/advanced/
So I think that hudi needs to improve the code adaptation after flink1.15, or give a reminder that the user should move the flink-table-planner jar package in the opt directory to the lib directory in the flink release version To adapt to the operating environment of hudi
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 2
- Comments: 19 (11 by maintainers)
@ertanden I also found this problem, it may can be solved by:
(1) Copy the codegen module from flink into hudi, but it is written by scala; (2) Try to use
PlannerCodeLoaderin flink to load these classes; (3) Instead of using the SortCodeGenerator API, look for other APIsCurrently I’m trying the second way to solve it
@danny0405 do you have any other suggestions~ This should be a blocker problem, 100% reproducible after flink 1.15
Any update on this?
Currently, it is impossible to run in AWS Kinesis Analytics a Hudi job in
append mode (COW and insert)withclustering enabled. Job runs fine withclustering.async.enabled=falsebut then we get many many small files…The class
org.apache.flink.table.planner.codegen.sort.SortCodeGeneratoris actually in theflink-table-planner_2.12not in theflink-table-planner-loader. That’s why this issue happens.However, as documented by Flink, dependency on
flink-table-planner_2.12is deprecated since1.15, and projects should refactor out of it.Is there an idea how to remove dependency to
org.apache.flink.table.planner.codegen.sort.SortCodeGenerator? I could give it a shot, but right now the code is not that familiar so I would need a clue where to start…The more serious problem is once you shade a jar, you need to shade all the jar that it depends on, or there could be conflicts because of the indirectly introduced classes.