ROCm: OpenCL "slow" performance on ethminer (ethereum)

The good news is that you can use “ROCm driver” for cpp-ethminer. Which does fail on amdgpu-pro 17.10. (at least on RX 480). I am posting this here, since I couldn’t find anyone using ROCm for mining.

I do not know what are differences between amdgpu-pro and ROCm version of OpenCL (AMD-APP), if someone explain it would be nice

The “problem” is performance. The performance on RX 480 using amdgpu-pro should be ~ 22MH/s [*], but the max I can get is ~19MH/s. The more interesting thing is, if I manually underclock to 900MHz [**] (level 2 in pp_dpm_sclk) the speed stays the same, but there is much reduction of noise heat and power consumption.

Is there any known reason for slower mining speed and not scaling on higher frequencises?

[*] http://www.phoronix.com/scan.php?page=article&item=ethminer-linux-gpus&num=2 [**]

sudo su
echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo 2 > /sys/class/drm/card0/device/pp_dpm_sclk
cat /sys/class/drm/card0/device/pp_dpm_sclk

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 24

Most upvoted comments

Now the AMDGPUpro driver for Vega10 supports the new lighting compiler and ROCm stack as well

When we started the ROCm project, we made a decision to build out fully open source solution, which meant we need to move away from the traditional Shader Compiler used in our graphics stack since it was staying proprietary. The traditional flow was two-stage compiler; we would compile the code to an intermediate language, HSAIL, then it would be picked up finalized and compiled by our shader compiler. This same backend used by Graphics shaders.

This journey started in earnest a little over a year ago to look the best way forward to fully open source compiler. We began with the LLVM R600 codebase which needed a bit of work to get to be production class compiler. But it was the right foundation to meet our goal of a fully open stack,

With this transition, we know we will have performance gaps, which we are working to close. What we need help with from the community is assist us in testing a broader set of applications and reporting the and do some analysis potentially why. One thing we have seen as well sometimes you need to code differently for LLVM compiler then the SC based compiler to get the best performance out if it.

We are now active in the LLVM community, pushing upgrades to the code base to better enable GPU computing. Also, changes are also up-streamed into LLVM repository.

Note one significant changes the compiler now generate GCN ISA binary object directly. With this change, it makes it easier for the compiler supports Inline ASM support for all of our languages ( OpenCL, HCC, HIP) and also native assembler and disassembler support. It is also a critical foundation for our math library and MiOpen projects.

For the last year, we have spent more time focusing on FIJI and Vega10 with Deep Learning Frameworks, MIOpen, and GEMM solvers. We also have been filling in the gaps in LLVM for the optimization we need for GPU Computing, also improving the scheduler, register allocator, loop optimizer and lot more. It is a bit of work as you can imagine. But we already saw where the effort been worth it since it faster on a number of the codes.

We test thing like follow on the compiler

  • Benchmarks: Bablestream, SHOC, Mixbench, Lattice, ViennaCL, COMD, Lulesh, xsbench. Rodina, DeepBench
  • Libraries: clFFT, rocBLAS, rocFFT, MIOpen
  • Application:
    • OpenCL: Torch-CL, Gromacs;
    • HIP: Caffe Torch, Tensorflow,
    • HCC: NAMD
  • Internal test we built up for performance for OpenCL
  • Conformance tests for
    • OpenCL 1.2 and 2.0 Conformance tests
    • HCC conformance test Note above is a small sample of what we run on the compiler. We do A/B compares

New test recently added: Radeon Rays, SideFX Houdini Test, Blender, Radeon ProRender, In the process of adding a number of currency mining apps

On Ray Tracer we are just starting our performance analysis and optimization that more specific to this class of work, What you see over the summer is we will be focusing on optimization for the compiler for currency mining and raytracing. I just have to stage this work in with the team. I saw you referenced Phoronix article, for ROCm 1.5 the new compiler was faster than LLVM/HSAIL/SC on FIJI for Blender, but for Luxmark we were slower. http://www.phoronix.com/scan.php?page=article&item=rocm-15-opencl&num=2

One thing I will leave you with is we build standardized loader and linker and object format, with this it allows us to do some you never could do with AMGGPUpro driver, upgrades the compiler before we release a new driver. So we can now address issue independently of the base driver for OpenCL, HCC, and HIP and the base LLVM compiler foundation.

Hope this helps

So I was looking at the data put the integer performance into Roofline plot to understand performance when and where which stack is faster.

What you see is the current miners are using very low IOPS/byte. Right now you see crossover point for the two stacks is 8.25 IOPS/byte then they merge again at about 2.25 IOPS/byte.

screen shot 2017-07-07 at 9 11 09 am

Now on SGEMM the cross over is 24.25 Flops per byte

screen shot 2017-07-07 at 9 21 02 am

This will show why FFT was slower on ROCm, GEMM is doing well ROCm.

We dig into this more and get you guys update patch.