amoro: [Bug]: Optimize status hangs in pending and minor

What happened?

Hi, Optimize Iceberg has encountered two problems, and I haven’t found the problem for a long time:

optimize group 1: The optimizer is still alive but the OptimizeQueue has no plan log, which means the queue no longer works? (It will no longer work after merging 2-3 tables successfully)
optimize group 2: optimize does not respond for a long time after the execution of minor, although its file size is very small. (Some tables succeed, some tables hang on or timeout retry)

And I’m not sure the bug ocurred by my change which I drop all status in BaseFileScan to reduce jvm heap memory(BTW, It does save a lot of memory. If this has no impact on optimization, I will trigger a new PR for this.).

Affects Versions

master/0.4.x

What engines are you seeing the problem on?

AMS

How to reproduce

just optimize a certain number of iceberg tables

Relevant log output

No response

Anything else

Code of Conduct

I agree to follow this project’s Code of Conduct

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 21 (15 by maintainers)

Commits related to this issue

fix #955 1.cache table spec for table entries scan 2.only include colum stats when necessary — committed to wangtaohz/arctic by wangtaohz 2 years ago
fix #955 1.cache table spec for table entries scan 2.only include colum stats when necessary — committed to XBaith/amoro by wangtaohz 2 years ago
[ARCTIC-955] Improve performance of Table entries scan (#962) * fix #955 1.cache table spec for table entries scan 2.only include colum stats when necessary * fix checkStyle * remove doAs from Arct... — committed to XBaith/amoro by wangtaohz a year ago
[ARCTIC-955] Improve performance of Table entries scan (#962) * fix #955 1.cache table spec for table entries scan 2.only include colum stats when necessary * fix checkStyle * remove doAs fro... — committed to NetEase/amoro by wangtaohz a year ago

Most upvoted comments

In one case, if the optimization task fails and is retried more than 5 times, the state of table optimization will be suspended. If we want to re-optimize it, we have to clean up the tasks, files and tables in the database then restart AMS.

In this case, the only way is to clean up these tasks, however, task clean-up is very cumbersome now. We are going to clean up these tasks automatically and leave an optimize-history with some error messages, then the Optimizing can keep going.

The AMS is under refactoring, after that the auto-clean-up feature will be included.

wangtaohz on Jan 10, 2023

Good answer. I do reset LOG level for avoiding print too many icebert logs when check in OptimizeQueueService. It’s the issue related this reason?

It may not. I can hardly find the relations between them.

wangtaohz on Dec 22, 2022