amoro: [Bug]: Optimize status hangs in pending and minor

What happened?

Hi, Optimize Iceberg has encountered two problems, and I haven’t found the problem for a long time:

  1. optimize group 1: The optimizer is still alive but the OptimizeQueue has no plan log, which means the queue no longer works? (It will no longer work after merging 2-3 tables successfully)

  2. optimize group 2: optimize does not respond for a long time after the execution of minor, although its file size is very small. (Some tables succeed, some tables hang on or timeout retry)

And I’m not sure the bug ocurred by my change which I drop all status in BaseFileScan to reduce jvm heap memory(BTW, It does save a lot of memory. If this has no impact on optimization, I will trigger a new PR for this.).

Affects Versions

master/0.4.x

What engines are you seeing the problem on?

AMS

How to reproduce

just optimize a certain number of iceberg tables

Relevant log output

No response

Anything else

image image

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (15 by maintainers)

Commits related to this issue

Most upvoted comments

In one case, if the optimization task fails and is retried more than 5 times, the state of table optimization will be suspended. If we want to re-optimize it, we have to clean up the tasks, files and tables in the database then restart AMS.

In this case, the only way is to clean up these tasks, however, task clean-up is very cumbersome now. We are going to clean up these tasks automatically and leave an optimize-history with some error messages, then the Optimizing can keep going.

The AMS is under refactoring, after that the auto-clean-up feature will be included.

Good answer. I do reset LOG level for avoiding print too many icebert logs when check in OptimizeQueueService. It’s the issue related this reason?

It may not. I can hardly find the relations between them.