tvm: TVM for ROCm 2.x is currently not working

Environment: Ubuntu 18.04 + ROCm 2.2 + TVM (built from current master with ROCM = ON)

I ensure the target TVM library successfully detect and link with ROCM, and the tuning procedure runs successfully, however, while executing tvm.build(s, arg_bufs, 'rocm', name='matmul'), it failed with the following error:

WARNING:autotvm:Too many errors happen in the tuning. Now is in debug mode
Finish loading 500 records
DEBUG:autotvm:Finish loading 500 records
Cannot find config for target=rocm, workload=('tvm_matmul_tune_op', 4, 256, 256). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=rocm, workload=('tvm_matmul_tune_op', 4, 256, 256). A fallback configuration is used, which may bring great performance regression.

Best config:
,None,None
[14:47:54] /host/docker/matmul_tvm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
[14:47:54] /host/docker/matmul_tvm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
Traceback (most recent call last):
  File "matmul_autotvm.py", line 260, in <module>
    search_matmul_config(4, 256, 256, 500) # m, k, n, num_trials
  File "matmul_autotvm.py", line 165, in search_matmul_config
    func = tvm.build(s, arg_bufs, 'rocm', name='matmul')
  File "/host/docker/matmul_tvm/tvm/python/tvm/build_module.py", line 617, in build
    fhost, mdev = _build_for_device(flist, tar, target_host)
  File "/host/docker/matmul_tvm/tvm/python/tvm/build_module.py", line 484, in _build_for_device
    mdev = codegen.build_module(fdevice, str(target)) if fdevice else None
  File "/host/docker/matmul_tvm/tvm/python/tvm/codegen.py", line 36, in build_module
    return _Build(lowered_func, target)
  File "/host/docker/matmul_tvm/tvm/python/tvm/_ffi/_ctypes/function.py", line 206, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (2) /host/docker/matmul_tvm/tvm/build_rocm/libtvm.so(TVMFuncCall+0x61) [0x7f9598de3f01]
  [bt] (1) /host/docker/matmul_tvm/tvm/build_rocm/libtvm.so(+0x14b2e9) [0x7f95986992e9]
  [bt] (0) /host/docker/matmul_tvm/tvm/build_rocm/libtvm.so(+0x231aaa) [0x7f959877faaa]
  File "/host/docker/matmul_tvm/tvm/src/codegen/codegen.cc", line 46
TVMError: Check failed: bf != nullptr: Target rocm is not enabled

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 48 (45 by maintainers)

Most upvoted comments

This is not clean, but you can modify this block for rocm target.

 if 'cuda' in self.task.target.keys or 'opencl' in self.task.target.keys:
            remote = request_remote(self.key, self.host, self.port)
            ctx = remote.context(str(self.task.target), 0)
            max_dims = ctx.max_thread_dimensions
            kwargs['check_gpu'] = {
                'max_shared_memory_per_block': ctx.max_shared_memory_per_block,
                'max_threads_per_block': ctx.max_threads_per_block,
                'max_thread_x': max_dims[0],
                'max_thread_y': max_dims[1],
                'max_thread_z': max_dims[2],
            }

For rocm, max_shared_memory_per_block should be 48KB, and max threads per block should be 256. Don’t forget to add “if ‘rocm’ in self.task.target.keys”.

@masahi

You can get more information at here: https://llvm.org/docs/AMDGPUUsage.html#code-object-metadata

And I just simply add the -mattr=-code-object-v3 in BuildAMDGPU at codegen_amdgpu.cc:182

config << "-mtriple=amdgcn-amd-amdhsa-hcc -mcpu=gfx"
       << DetectROCMComputeVersion(target) << " -mattr=-code-object-v3 "
       << target.substr(4, target.length() - 4);

hi, @masahi @ghostplant

I have same problem, and I found the solution that works for me. So, I would sharing my experience on this. The hipErrorNotFound error was got when your environment is ROCm2.3 (not compatible with LLVM6 or 7) or use LLVM8+.

The reason is that LLVM uses -mattr=+code-object-v3 in default for HSACO generation, but the ROCm hipModuleLoad function which only accepts a HSACO with code object v2.

It means you need to pass -mattr=-code-object-v3 as the option explicitly. Note the value given to -mattr, the plus sign + be replaced to minus sign -.