tensorflow-upstream: Multiple build problems 1.12 on rocm 2.0

I am trying to package rocm 2.0 for ArchLinux, and as a test, run tensorflow 1.12 to confirm that it’s working. Here’s a list of problems I encountered:

  1. pypi package for tensorflow-rocm=1.12 runs fine in a docker container with rocm, but fails during initialization natively:

    Message: Process 108423 (ipython) of user 1000 dumped core.

             Stack trace of thread 108423:
             #0  0x00007f8269550d7f raise (libc.so.6)
             #1  0x00007f826953b672 abort (libc.so.6)
             #2  0x00007f81c75db58e _ZN9__gnu_cxx27__verbose_terminate_handlerEv (libstdc++.so.6)
             #3  0x00007f81c75e1dfa _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6)
             #4  0x00007f81c75e1e57 _ZSt9terminatev (libstdc++.so.6)
             #5  0x00007f81c75e20ac __cxa_throw (libstdc++.so.6)
             #6  0x00007f81c75e2647 _Znwm (libstdc++.so.6)
             #7  0x00007f81cd66878f _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_assignERKS4_ (libhip_hcc.so)
             #8  0x00007f81cbe8f87d n/a (/home/kuba/.local/share/virtualenvs/tf-playground-Gy5OT1mH/lib/python3.6/site-packages/tensorflow/libtensorflow_framework.so)
             #9  0x0000000000000000 n/a (n/a)
             #10 0x0000000000000004 n/a (n/a)
    

    This is a std::bad_alloc error, I’m not sure how to debug this

  2. I then tried to build it locally, on r1.12-rocm. After downgrading to bazel 0.19 (0.20 had other problems), this is the diff of fixes I had to apply:

     diff --git a/WORKSPACE b/WORKSPACE
     index 17961829a6..340fa1b662 100644
     --- a/WORKSPACE
     +++ b/WORKSPACE
     @@ -1,5 +1,6 @@
     workspace(name = "org_tensorflow")
     
     +load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
     http_archive(
         name = "io_bazel_rules_closure",
         sha256 = "a38539c5b5c358548e75b44141b4ab637bba7c4dc02b46b1f62a96d6433f56ae",
    
     diff --git a/tensorflow/contrib/lite/experimental/micro/tools/make/targets/bluepill_makefile.inc b/tensorflow/contrib/lite/experimental/micro/tools/make/targets/bluepill_makefile.inc
     index 022a8422dc..fdc5bbe201 100644
     --- a/tensorflow/contrib/lite/experimental/micro/tools/make/targets/bluepill_makefile.inc
     +++ b/tensorflow/contrib/lite/experimental/micro/tools/make/targets/bluepill_makefile.inc
     @@ -28,7 +28,6 @@ ifeq ($(TARGET), bluepill)
         -Wno-sign-compare \
         -fno-delete-null-pointer-checks \
         -fomit-frame-pointer \
     -    -fpermissive \
         -nostdlib \
         -g \
         -Os
     diff --git a/third_party/gpus/rocm_configure.bzl b/third_party/gpus/rocm_configure.bzl
     index 18987b886e..36494c6e41 100644
     --- a/third_party/gpus/rocm_configure.bzl
     +++ b/third_party/gpus/rocm_configure.bzl
     @@ -366,7 +366,7 @@ def _find_libs(repository_ctx, rocm_config):
                 "hip_hcc",
                 repository_ctx,
                 cpu_value,
     -            rocm_config.rocm_toolkit_path,
     +            rocm_config.rocm_toolkit_path + "/hip/lib",
             ),
             "rocblas": _find_rocm_lib(
                 "rocblas",
     @@ -731,7 +731,7 @@ def _create_local_rocm_repository(repository_ctx):
             "crosstool:clang/bin/crosstool_wrapper_driver_rocm",
             {
                 "%{cpu_compiler}": str(cc),
     -            "%{hipcc_path}": "/opt/rocm/bin/hipcc",
     +            "%{hipcc_path}": "/opt/rocm/hip/bin/hipcc",
                 "%{hipcc_env}": _hipcc_env(repository_ctx),
                 "%{crosstool_verbose}": _crosstool_verbose(repository_ctx),
                 "%{gcc_host_compiler_path}": str(cc),
    
  3. Finally, I got this problem with include files that I don’t know how to solve: absolute paths are not allowed in hdrs = ...:

     ERROR: /home/kuba/.cache/bazel/_bazel_kuba/e27c88bdf3f11017a571c03bf1082c66/external/rccl_archive/BUILD.bazel:10:1: undeclared inclusion(s) in rule '@rccl_archive//:rccl':
     this rule is missing dependency declarations for the following files included by 'external/rccl_archive/src/rccl.cpp':
       '/opt/rocm/hip/include/hip/hip_runtime_api.h'
       '/opt/rocm/hip/include/hip/hip_common.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h'
       '/opt/rocm/hip/include/hip/hcc_detail/host_defines.h'
       '/opt/rocm/hip/include/hip/hcc_detail/driver_types.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_texture_types.h'
       '/opt/rocm/hip/include/hip/hcc_detail/channel_descriptor.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_vector_types.h'
       '/opt/rocm/hip/include/hip/hcc_detail/texture_types.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_surface_types.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_prof_api.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_prof_str.h'
       '/opt/rocm/hip/include/hip/hip_runtime.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_runtime.h'
       '/opt/rocm/hip/include/hip/hcc_detail/grid_launch.h'
       '/opt/rocm/hip/include/hip/hcc_detail/grid_launch_GGL.hpp'
       '/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp'
       '/opt/rocm/hip/include/hip/hcc_detail/code_object_bundle.hpp'
       '/opt/rocm/hip/include/hip/hcc_detail/concepts.hpp'
       '/opt/rocm/hip/include/hip/hcc_detail/helpers.hpp'
       '/opt/rocm/hip/include/hip/hcc_detail/program_state.hpp'
       '/opt/rocm/hip/include/hip/hip_hcc.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_ldg.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_atomic.h'
       '/opt/rocm/hip/include/hip/hcc_detail/device_functions.h'
       '/opt/rocm/hip/include/hip/hcc_detail/math_fwd.h'
       '/opt/rocm/hip/include/hip/hip_vector_types.h'
       '/opt/rocm/hip/include/hip/hcc_detail/device_library_decls.h'
       '/opt/rocm/hip/include/hip/hcc_detail/llvm_intrinsics.h'
       '/opt/rocm/hip/include/hip/hcc_detail/surface_functions.h'
       '/opt/rocm/hip/include/hip/hcc_detail/texture_functions.h'
       '/opt/rocm/hip/include/hip/hcc_detail/math_functions.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_fp16_math_fwd.h'
       '/opt/rocm/hip/include/hip/hcc_detail/hip_memory.h'
    

This is the packaging script for ArchLinux that I have so far:

# Maintainer: Jakub Okoński <jakub@okonski.org>
pkgname=rocm-bin
pkgver=2.0
pkgrel=1
pkgdesc="ROCm libraries and tools"
arch=(x86_64)
_rpm_repo="http://repo.radeon.com/rocm/yum/rpm/"
url="https://github.com/RadeonOpenCompute/ROCm"
license=('unknown')
rpm_names=(
  cxlactivitylogger-5.6.7219-gf50cd35.x86_64.rpm
  hcc-1.3.18482-Linux.rpm
  hip_base-1.5.18494.rpm
  hip_hcc-1.5.18494.rpm
  hsa-ext-rocr-dev-1.1.9-45-ge88639f6-Linux.rpm
  hsa-rocr-dev-1.1.9-45-ge88639f6-Linux.rpm
  hsakmt-roct-1.0.9-99-g3ba20ce-Linux.rpm
  hsakmt-roct-dev-1.0.9-99-g3ba20ce-Linux.rpm
  MIOpen-HIP-1.7.0-49c4891-Linux.rpm
  miopengemm-1.1.5-9547fb9-Linux.rpm
  rocblas-2.0.0.0-Linux.rpm
  rocfft-0.8.8.0-Linux.rpm 
  rocm-dev-2.0.89-Linux.rpm
  rocm-device-libs-0.0.1-Linux.rpm
  rocm-libs-2.0.89-Linux.rpm
  rocm-opencl-1.2.0-2018121346.x86_64.rpm
  rocm-opencl-devel-1.2.0-2018121346.x86_64.rpm
  rocm-profiler-5.6.7219-g35b67c4.x86_64.rpm
  rocm_smi_lib64-1.0.0.rpm
  rocm-utils-2.0.89-Linux.rpm
  rocminfo-1.0.0-Linux.rpm    
  rocrand-1.8.1-Linux.rpm
)
source=("${rpm_names[@]/#/$_rpm_repo}")
sha256sums=(
  5a67eefbb13f0cfac8a5fee90a17f5eccf9a7f60d657ee5dff4a0479686ebdb4
  c1b8320ac03158eb3e1282068c33904f51b70787ba62a0c91f732d9d18547c98
  d294361155f9f29d1021221b27570f22215a8d78f4757f5c21cce6edca83da5a
  70b7c146b8d142bc21c4ec51464049d1d60ab5977e195ac7d4bf26d4c832621a
  f12fba94fc95d9c495bfec5c9c3cf9d1c3150c40d4d6ca4b444286ac0ee52876
  c60c4feb1c99b62733a8ef9a471ee715229630f0aab127635f57cced5a0a11e5
  33bd90ca0e5e254019bdb27a0fdf376e696eaa322f93fe77ed6ba893d89eaaed 
  7848e972ca151c473f01252cb83a32eaa71512ccb5b68ec6aeda7a82c84a06b3 
  45508fb4980236ac92551fd48245276ac5aeef0112d64ddc116b6639393b1f0d
  bf504dca842cfb1968e1476018c97f0b05e2121407eb3a9d8205a93e672116e0
  d76238f3c984d862646ce2b47bc19f91ac476dd9a59b41e968fb6b2a405df1e9
  c5ddf247d620f624c5c41848714d6ee6fb039765ae2a7976877729c317824b48
  f3b7e1232f9b87ea965f95d2398f3c01bea0e4d42b9db1d570b4124085bdbf5c
  63c8b01aa803febed5d25cbcfe9a3667cd631946388fa9675d2eab66b34d43c1
  b4c9707790c95c9920da261b135cc23c1779b43e29249ca99cae0722875e4758
  17a99a3c772ba9c586b477b153ec4ebbda43eec1ba06200adf0deb9a557649ba
  4fc5ab9a08c4c4970a5c58374d65bbfd9a8d04d9ee818df672a8363025d68197
  4ab76027996cde10a71ba11f47f75ae0d94322073aeee93fc879b0411aba9b2f
  7ff217fdf93d4fb18a19fd0f0ff39a68db07e66833f15ebb9bbc25e31b131513
  708ae27fefd6663bb2f0cf22f3f046c6ccac1d6a3ef54c980d9ce7e5df5e56b0
  537010c63014966be3db3b382b6fd9c8970e19b3dc25161ff7de70836ab404f3
  920a5f5a05856ac627ecdf633c4167d8b886a366126b26ff87d86adbbd353cdc
)

package() {
  mkdir -p $pkgdir/etc/ld.so.conf.d
  cat <<-EOF > $pkgdir/etc/ld.so.conf.d/rocm-bin.conf
    /opt/rocm/lib
    /opt/rocm/lib64
    /opt/rocm/hip/lib
    /opt/rocm/hiprand/lib
    /opt/rocm/hsa/lib
    /opt/rocm/opencl/lib/x86_64
    /opt/rocm/rocrand/lib
    EOF
  cp -r opt $pkgdir
}

Just extracting it, adding a ld.so.conf.d config so libraries get discovered, and repackaging into something arch understands.

By the way, congrats on the release - the Docker experience on upstream kernel is truly amazing.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16

Commits related to this issue

Most upvoted comments

hi @farnoy, the r1.12.0 pypl package works well on my ROCm2.1 based docker image:

ldd ./usr/local/lib/python3.5/dist-packages/tensorflow/libtensorflow_framework.so
        linux-vdso.so.1 =>  (0x00007fffe6af6000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd276cd2000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd2769c9000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd2767ac000)
        libhc_am.so => /opt/rocm/hcc/bin/../lib/libhc_am.so (0x00007fd27654c000)
        libhip_hcc.so => /opt/rocm/lib/libhip_hcc.so (0x00007fd275c47000)
        libhsa-runtime64.so.1 => /opt/rocm/hsa/lib/libhsa-runtime64.so.1 (0x00007fd275992000)
        libCXLActivityLogger.so => /opt/rocm/lib/libCXLActivityLogger.so (0x00007fd275743000)
        libhiprand.so.1 => /opt/rocm/hiprand/lib/libhiprand.so.1 (0x00007fd275516000)
        librocfft.so.0 => /opt/rocm/lib/librocfft.so.0 (0x00007fd27525b000)
        librocblas.so.0 => /opt/rocm/lib/librocblas.so.0 (0x00007fd2721b0000)
        libMIOpen.so.1 => /opt/rocm/lib/libMIOpen.so.1 (0x00007fd27181d000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd27149b000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd271285000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd270ebb000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fd278313000)
        libmcwamp.so => /opt/rocm/hcc/bin/../lib/libmcwamp.so (0x00007fd270ca1000)
        libhsakmt.so.1 => /opt/rocm/lib/libhsakmt.so.1 (0x00007fd270a7d000)
        libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007fd270865000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd27065d000)
        librocrand.so.1 => /opt/rocm/rocrand/lib/librocrand.so.1 (0x00007fd270025000)
        librocfft-device.so.0 => /opt/rocm/bin/../lib/librocfft-device.so.0 (0x00007fd267fb1000)
        libmiopengemm.so => /opt/rocm/lib/libmiopengemm.so (0x00007fd266b13000)
        libamdocl64.so => /opt/rocm/opencl/lib/x86_64/libamdocl64.so (0x00007fd26284f000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fd262644000)
        libpci.so.3 => /lib/x86_64-linux-gnu/libpci.so.3 (0x00007fd262437000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fd26221d000)
        libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007fd262002000)
        libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007fd278502000)

The following Dockerfile should be helpful to guide you to setup the dependancies: https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/r1.12-rocm/tensorflow/tools/ci_build/Dockerfile.rocm

To build TF from source on Ubuntu, the following Docker image is readily available: rocm/tensorflow:rocm2.1-tf1.12-python3-dev

Well, perhaps a better way to put it is “build-from-source ROCr is broken” and I’ll need to go back and add a temporary patch into the scripts I’m recommending people try to follow. 😃

But yes, it’s probably pulling its version numbers from the latest tag, and our internal repo (that builds the official .rpm and .deb files) isn’t having the same issue. Anyway, libhsaruntime64.so.2 shouldn’t exist. It should still be .1