Tensile: Tensile won't produce backend libraries for archs without optimized logic files when using --separate-architectures

Issue

Tensile won’t produce backend libraries for archs without optimized logic files when using --separate-architectures.

Description

According with https://github.com/ROCmSoftwarePlatform/Tensile/issues/1165#issuecomment-1094556880 “gfx1010 has been enabled by default in rocBLAS builds since ROCm 4.3.0.” however since rocBLAS does not have optimized logic files for navi10 no library is produced for gfx1010.

$ drun --rm rocm/dev-ubuntu-22.04:5.6-complete
root@ftl:/# ls -1 /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx*
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat

Expected

Tensile should produce libraries for all requested architectures, using the fallback logic files for archs missing optimized logic files.

Workaround

Building rocBLAS with --merge-architectures --no-lazy-library-loading seems to avoid the issue.

Patch

https://github.com/ulyssesrr/docker-rocm-xtra/blob/3be41a9d79ff4f4324f3f34383b2282529c0c4b7/rocm-xtra-builder-rocblas/patches/Tensile-fix-fallback-arch-build.patch

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Reactions: 8
  • Comments: 33 (28 by maintainers)

Commits related to this issue

Most upvoted comments

Although that’s probably not the right place, I really needed to say thank you! I’ve struggled with that basically since my card has been released and finally I was able to fix it because of you.

Doing compute stuff is just a nightmare with AMD, really.

@nakajee I think at the current stage we don’t have to test on gfx1010 yet. The first step is to confirm that when compiling any already supported arch with gfx1010 (such as AMDGPU_TARGETS="gfx1010;gfx1030"), all tests pass, as per the directions specified in #1897. Currently I cannot build rocBLAS at head (5937a87d) with ROCm 6.0 because I get the following error message:

# Tensile Create Library
Tensile::WARNING: Did not detect SupportedISA: [(8, 0, 3), (9, 0, 0), (9, 0, 6), (9, 0, 8), (9, 0, 10), (9, 4, 0), (9, 4, 1), (9, 4, 2), (10, 1, 0), (10, 1, 1), (10, 1, 2), (10, 3, 0), (10, 3, 1), (11, 0, 0), (11, 0, 1), (11, 0, 2)]; cannot benchmark assembly kernels.
# Found  hipcc version 6.0.0-0
Tensile::FATAL: Cached asm caps differ from derived asm caps for (9, 0, 10)
CMake Error at build/virtualenv/cmake/TensileConfig.cmake:277 (message):
  Error creating Tensile library: 255
Call Stack (most recent call first):
  library/src/CMakeLists.txt:74 (TensileCreateLibraryFiles)

I will implement some workaround for this fail. To make it work now,

  • checkout Tensile with commit id is the same as tensile_tag.txt in rocblas
  • modify Tensile/Common.py globalParameters[“IgnoreAsmCapCache”] = False -> change to True
  • build rocblas with -t [path to Tensile]