AMDMIGraphX: Resnext50 failing to run on MIGraphX Driver

While trying https://zenodo.org/record/6617879/files/resnext50_32x4d_fpn.onnx with… migraphx-driver perf ./resnext50_32x4d_fpn.onnx

I hit an error indicating an operator is not supported by migraphx… what(): /workspace/AMDMIGraphX/src/onnx/onnx_parser.cpp:318: parse_graph: Unknown operator: Mod

Investigate and add

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 38

Commits related to this issue

Most upvoted comments

We’re finally getting runs on resnext50 thanks to @pfultz2 and hinding and fixing an issue with concat vectorization #1653 In the meantime i was trying to make our NMS multithreaded for a single huge batch thinking that was still the issue as we worked in parallel.

Adding both of these fixes ontop of each-other gives us the following result

Summary:
gpu::topk: 25017.3ms / 5 = 5003.47ms, 85%
nonmaxsuppression: 3860.93ms / 1 = 3860.93ms, 14%
hip::copy_from_gpu: 183.658ms / 4 = 45.9146ms, 1%
gpu::code_object::reduce_kernel: 143.436ms / 1 = 143.436ms, 1%
gpu::nonzero: 127.553ms / 5 = 25.5106ms, 1%
hip::copy_to_gpu: 70.1181ms / 1 = 70.1181ms, 1%
gpu::miopen_fusion: 23.6265ms / 49 = 0.482174ms, 1%
gpu::convolution: 20.3926ms / 53 = 0.384766ms, 1%
gpu::code_object::gather_kernel: 15.7473ms / 50 = 0.314946ms, 1%
gpu::code_object::concat_kernel: 9.74119ms / 19 = 0.512694ms, 1%
gpu::code_object::mul_add_kernel: 6.44609ms / 21 = 0.306956ms, 1%
gpu::code_object::convert_kernel: 4.20801ms / 12 = 0.350667ms, 1%
gpu::code_object::max_min_kernel: 3.53736ms / 10 = 0.353737ms, 1%
gpu::code_object::sub_kernel: 3.49076ms / 20 = 0.174538ms, 1%
gpu::code_object::min_exp_mul_mul_kernel: 3.35976ms / 10 = 0.335976ms, 1%
gpu::code_object::add_relu_kernel: 3.00536ms / 37 = 0.0812258ms, 1%
gpu::code_object::concat_add_kernel: 2.82924ms / 1 = 2.82924ms, 1%
gpu::code_object::add_kernel: 2.11693ms / 17 = 0.124525ms, 1%
gpu::code_object::contiguous_kernel: 1.44307ms / 10 = 0.144307ms, 1%
gpu::code_object::add_add_relu_kernel: 1.2778ms / 12 = 0.106483ms, 1%
gpu::code_object::concat_mod_kernel: 0.97079ms / 1 = 0.97079ms, 1%
gpu::code_object::less_convert_convert_logical_xor_mod_equal_convert_convert_not_logical_and_mul_add_where_kernel: 0.739805ms / 5 = 0.147961ms, 1%
gpu::code_object::gathernd_kernel: 0.723255ms / 5 = 0.144651ms, 1%
gpu::code_object::sigmoid_kernel: 0.655794ms / 5 = 0.131159ms, 1%
gpu::code_object::greater_convert_kernel: 0.479331ms / 5 = 0.0958662ms, 1%
gpu::code_object::mul_kernel: 0.47863ms / 1 = 0.47863ms, 1%
load: 0.365675ms / 407 = 0.000898464ms, 1%
gpu::pooling: 0.250205ms / 1 = 0.250205ms, 1%
hip::hip_copy_literal: 0.146673ms / 147 = 0.000997776ms, 1%
slice: 0.076922ms / 46 = 0.00167222ms, 1%
unsqueeze: 0.06204ms / 60 = 0.001034ms, 1%
broadcast: 0.060922ms / 53 = 0.00114947ms, 1%
multibroadcast: 0.055462ms / 56 = 0.000990393ms, 1%
step: 0.0426ms / 33 = 0.00129091ms, 1%
reshape: 0.035711ms / 36 = 0.000991972ms, 1%
gpu::code_object::relu_kernel: 0.0212ms / 1 = 0.0212ms, 1%
get_tuple_elem: 0.02079ms / 10 = 0.002079ms, 1%
squeeze: 0.019261ms / 12 = 0.00160508ms, 1%
transpose: 0.01602ms / 15 = 0.001068ms, 1%
@param: 0.006191ms / 4 = 0.00154775ms, 1%
flatten: 0.00549ms / 5 = 0.001098ms, 1%
hip::sync_stream: 0.00529ms / 1 = 0.00529ms, 1%
check_context::migraphx::version_2_6_0::gpu::context: 0.00369ms / 1 = 0.00369ms, 1%
hip::hip_allocate_memory: 0.00231ms / 1 = 0.00231ms, 1%

Batch size: 1
Rate: 0.0331548/sec
Total time: 30161.5ms
Total instructions time: 29509.5ms
Overhead time: 0.636814ms, 652.028ms
Overhead: 0%, 2%

We still have an issue with accuracy when using the accuracy checker and we’re currently seeing an HSA_FAULT when running with MIGRAPHX_GPU_DEBUG =1

./migraphx/kernels/gather.hpp:59: operator(): error: Out of bounds access at offset: 23760000
:0:rocdevice.cpp            :2647: 4812524667207 us: 3344692: [tid:0x7f5677d9a700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation

which is something I’m looking into between reviews