alluxio: Formatted master incorrectly accepts blocks created before the format due to blockID(containerID) clash
Alluxio Version: What version of Alluxio are you using?
Describe the bug alluxio master will generate block container id from 0, after master format and restart without worker format, the old worker will report its block to alluxio master, not sure the master will accept the block by accident.
To Reproduce Introduced in https://github.com/Alluxio/alluxio/pull/14006#issuecomment-913334686
The new master has generated some blocks with container IDs starting from 0
. Then the worker from the previous cluster registers with old blocks (created with the old master, with container ID starting from 0
). If you are unlucky, a block on this old worker may have old blockID 0
and the new blockID 0
has been allocated to a totally irrelevant file. The new master, in this case, will mistakenly think this blockID is recognized and accepts this copy from the old worker.
Then if the block lengths do not match (the new 0
vs old 0
), the master will throw the error referred in the link.
What’s worse, if the block lengths DO match, the master will think this copy belongs to this totally irrelevant file. Then the Alluxio data is messed up without you noticing!
Expected behavior A clear and concise description of what you expected to happen.
Urgency HIGH
Additional context Add any other context about the problem here.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 17 (17 by maintainers)
@HelloHorizon @ZhuTopher The design doc has passed review and #14258 is in progress.
@HelloHorizon I re-reviewed the design doc on Dec. 14, 2021. Not sure if the doc has been changed since then or not. I believe Jiacheng wanted to take another look at it still?
I haven’t re-reviewed the corresponding PR #14258 since my initial pass, but Jiacheng has been making progress with change requests there.
Let’s not start block-container Ids from
System.currentTimeMillis()
is what I mean.