iree: Dispatch workgroups along X only to prevent exceeding max number of block on GPU
What happened?
Both CUDA and Vulkan may have a low max number (65535) of workgroups along dimensions Y and Z. Even though this is not a complete solution, one case that should fix the problem for a while is to only dispatch workgroups along X. This is what current XLA does at the moment and seems to have been working for a while.
In order to do that we can change the pass TileAndDistributeToWorkgroup to decide how many dimensions we can dispatch along. Right now this is hardcoded to 3 here, we should change kNumMaxParallelDims
to be a pass option and use 1 for both LLVMGPU and Vulkan backend.
Steps to reproduce your issue
No response
What component(s) does this issue relate to?
No response
Version information
No response
Additional context
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (9 by maintainers)
Commits related to this issue
- Add maxWorkgroupParallelDims option to TileAndDistributeToWorkgroupsPass (#12691) `maxWorkgroupParallelDims` option default to 3 which will distribute workgroups along `X`, `Y`, and `Z` dimensions (c... — committed to iree-org/iree by KoolJBlack a year ago
- Add maxWorkgroupParallelDims option to TileAndDistributeToWorkgroupsPass (#12691) `maxWorkgroupParallelDims` option default to 3 which will distribute workgroups along `X`, `Y`, and `Z` dimensions (c... — committed to qedawkins/iree by KoolJBlack a year ago
- Dispatch CUDA workgroups along single dimension (#12726) Setting the maxWorkgroupParallelDims option to 1 will dispatch all workgroups along the X dimension only. Addresses #12642. — committed to iree-org/iree by KoolJBlack a year ago
- Add maxWorkgroupParallelDims option to TileAndDistributeToWorkgroupsPass (#12691) `maxWorkgroupParallelDims` option default to 3 which will distribute workgroups along `X`, `Y`, and `Z` dimensions (c... — committed to iree-org/iree by KoolJBlack a year ago
- Dispatch CUDA workgroups along single dimension (#12726) Setting the maxWorkgroupParallelDims option to 1 will dispatch all workgroups along the X dimension only. Addresses #12642. — committed to iree-org/iree by KoolJBlack a year ago
- Add maxWorkgroupParallelDims option to TileAndDistributeToWorkgroupsPass (#12691) `maxWorkgroupParallelDims` option default to 3 which will distribute workgroups along `X`, `Y`, and `Z` dimensions (c... — committed to NatashaKnk/iree by KoolJBlack a year ago
- Dispatch CUDA workgroups along single dimension (#12726) Setting the maxWorkgroupParallelDims option to 1 will dispatch all workgroups along the X dimension only. Addresses #12642. — committed to NatashaKnk/iree by KoolJBlack a year ago
IDs are meant to represent the order of processing of data. This is already what we do right now and this happens when we do tile and distribute and hopefully we keep it separated from the IDs.
Right, the difference is that it will have a performance cost so we have to decide when we want to use it.
Ideally we come up with one solution that solves both. Vulkan might be more restricted in which case we should solve for Vulkan and it would solve cuda.