tensorflow-upstream: Multiple build problems 1.12 on rocm 2.0
I am trying to package rocm 2.0 for ArchLinux, and as a test, run tensorflow 1.12 to confirm that it’s working. Here’s a list of problems I encountered:
-
pypi package for
tensorflow-rocm=1.12runs fine in a docker container with rocm, but fails during initialization natively:Message: Process 108423 (ipython) of user 1000 dumped core.
Stack trace of thread 108423: #0 0x00007f8269550d7f raise (libc.so.6) #1 0x00007f826953b672 abort (libc.so.6) #2 0x00007f81c75db58e _ZN9__gnu_cxx27__verbose_terminate_handlerEv (libstdc++.so.6) #3 0x00007f81c75e1dfa _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6) #4 0x00007f81c75e1e57 _ZSt9terminatev (libstdc++.so.6) #5 0x00007f81c75e20ac __cxa_throw (libstdc++.so.6) #6 0x00007f81c75e2647 _Znwm (libstdc++.so.6) #7 0x00007f81cd66878f _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_assignERKS4_ (libhip_hcc.so) #8 0x00007f81cbe8f87d n/a (/home/kuba/.local/share/virtualenvs/tf-playground-Gy5OT1mH/lib/python3.6/site-packages/tensorflow/libtensorflow_framework.so) #9 0x0000000000000000 n/a (n/a) #10 0x0000000000000004 n/a (n/a)This is a
std::bad_allocerror, I’m not sure how to debug this -
I then tried to build it locally, on
r1.12-rocm. After downgrading to bazel 0.19 (0.20 had other problems), this is the diff of fixes I had to apply:diff --git a/WORKSPACE b/WORKSPACE index 17961829a6..340fa1b662 100644 --- a/WORKSPACE +++ b/WORKSPACE @@ -1,5 +1,6 @@ workspace(name = "org_tensorflow") +load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") http_archive( name = "io_bazel_rules_closure", sha256 = "a38539c5b5c358548e75b44141b4ab637bba7c4dc02b46b1f62a96d6433f56ae", diff --git a/tensorflow/contrib/lite/experimental/micro/tools/make/targets/bluepill_makefile.inc b/tensorflow/contrib/lite/experimental/micro/tools/make/targets/bluepill_makefile.inc index 022a8422dc..fdc5bbe201 100644 --- a/tensorflow/contrib/lite/experimental/micro/tools/make/targets/bluepill_makefile.inc +++ b/tensorflow/contrib/lite/experimental/micro/tools/make/targets/bluepill_makefile.inc @@ -28,7 +28,6 @@ ifeq ($(TARGET), bluepill) -Wno-sign-compare \ -fno-delete-null-pointer-checks \ -fomit-frame-pointer \ - -fpermissive \ -nostdlib \ -g \ -Os diff --git a/third_party/gpus/rocm_configure.bzl b/third_party/gpus/rocm_configure.bzl index 18987b886e..36494c6e41 100644 --- a/third_party/gpus/rocm_configure.bzl +++ b/third_party/gpus/rocm_configure.bzl @@ -366,7 +366,7 @@ def _find_libs(repository_ctx, rocm_config): "hip_hcc", repository_ctx, cpu_value, - rocm_config.rocm_toolkit_path, + rocm_config.rocm_toolkit_path + "/hip/lib", ), "rocblas": _find_rocm_lib( "rocblas", @@ -731,7 +731,7 @@ def _create_local_rocm_repository(repository_ctx): "crosstool:clang/bin/crosstool_wrapper_driver_rocm", { "%{cpu_compiler}": str(cc), - "%{hipcc_path}": "/opt/rocm/bin/hipcc", + "%{hipcc_path}": "/opt/rocm/hip/bin/hipcc", "%{hipcc_env}": _hipcc_env(repository_ctx), "%{crosstool_verbose}": _crosstool_verbose(repository_ctx), "%{gcc_host_compiler_path}": str(cc), -
Finally, I got this problem with include files that I don’t know how to solve: absolute paths are not allowed in
hdrs = ...:ERROR: /home/kuba/.cache/bazel/_bazel_kuba/e27c88bdf3f11017a571c03bf1082c66/external/rccl_archive/BUILD.bazel:10:1: undeclared inclusion(s) in rule '@rccl_archive//:rccl': this rule is missing dependency declarations for the following files included by 'external/rccl_archive/src/rccl.cpp': '/opt/rocm/hip/include/hip/hip_runtime_api.h' '/opt/rocm/hip/include/hip/hip_common.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_runtime_api.h' '/opt/rocm/hip/include/hip/hcc_detail/host_defines.h' '/opt/rocm/hip/include/hip/hcc_detail/driver_types.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_texture_types.h' '/opt/rocm/hip/include/hip/hcc_detail/channel_descriptor.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_vector_types.h' '/opt/rocm/hip/include/hip/hcc_detail/texture_types.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_surface_types.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_prof_api.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_prof_str.h' '/opt/rocm/hip/include/hip/hip_runtime.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_runtime.h' '/opt/rocm/hip/include/hip/hcc_detail/grid_launch.h' '/opt/rocm/hip/include/hip/hcc_detail/grid_launch_GGL.hpp' '/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp' '/opt/rocm/hip/include/hip/hcc_detail/code_object_bundle.hpp' '/opt/rocm/hip/include/hip/hcc_detail/concepts.hpp' '/opt/rocm/hip/include/hip/hcc_detail/helpers.hpp' '/opt/rocm/hip/include/hip/hcc_detail/program_state.hpp' '/opt/rocm/hip/include/hip/hip_hcc.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_ldg.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_atomic.h' '/opt/rocm/hip/include/hip/hcc_detail/device_functions.h' '/opt/rocm/hip/include/hip/hcc_detail/math_fwd.h' '/opt/rocm/hip/include/hip/hip_vector_types.h' '/opt/rocm/hip/include/hip/hcc_detail/device_library_decls.h' '/opt/rocm/hip/include/hip/hcc_detail/llvm_intrinsics.h' '/opt/rocm/hip/include/hip/hcc_detail/surface_functions.h' '/opt/rocm/hip/include/hip/hcc_detail/texture_functions.h' '/opt/rocm/hip/include/hip/hcc_detail/math_functions.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_fp16_math_fwd.h' '/opt/rocm/hip/include/hip/hcc_detail/hip_memory.h'
This is the packaging script for ArchLinux that I have so far:
# Maintainer: Jakub Okoński <jakub@okonski.org>
pkgname=rocm-bin
pkgver=2.0
pkgrel=1
pkgdesc="ROCm libraries and tools"
arch=(x86_64)
_rpm_repo="http://repo.radeon.com/rocm/yum/rpm/"
url="https://github.com/RadeonOpenCompute/ROCm"
license=('unknown')
rpm_names=(
cxlactivitylogger-5.6.7219-gf50cd35.x86_64.rpm
hcc-1.3.18482-Linux.rpm
hip_base-1.5.18494.rpm
hip_hcc-1.5.18494.rpm
hsa-ext-rocr-dev-1.1.9-45-ge88639f6-Linux.rpm
hsa-rocr-dev-1.1.9-45-ge88639f6-Linux.rpm
hsakmt-roct-1.0.9-99-g3ba20ce-Linux.rpm
hsakmt-roct-dev-1.0.9-99-g3ba20ce-Linux.rpm
MIOpen-HIP-1.7.0-49c4891-Linux.rpm
miopengemm-1.1.5-9547fb9-Linux.rpm
rocblas-2.0.0.0-Linux.rpm
rocfft-0.8.8.0-Linux.rpm
rocm-dev-2.0.89-Linux.rpm
rocm-device-libs-0.0.1-Linux.rpm
rocm-libs-2.0.89-Linux.rpm
rocm-opencl-1.2.0-2018121346.x86_64.rpm
rocm-opencl-devel-1.2.0-2018121346.x86_64.rpm
rocm-profiler-5.6.7219-g35b67c4.x86_64.rpm
rocm_smi_lib64-1.0.0.rpm
rocm-utils-2.0.89-Linux.rpm
rocminfo-1.0.0-Linux.rpm
rocrand-1.8.1-Linux.rpm
)
source=("${rpm_names[@]/#/$_rpm_repo}")
sha256sums=(
5a67eefbb13f0cfac8a5fee90a17f5eccf9a7f60d657ee5dff4a0479686ebdb4
c1b8320ac03158eb3e1282068c33904f51b70787ba62a0c91f732d9d18547c98
d294361155f9f29d1021221b27570f22215a8d78f4757f5c21cce6edca83da5a
70b7c146b8d142bc21c4ec51464049d1d60ab5977e195ac7d4bf26d4c832621a
f12fba94fc95d9c495bfec5c9c3cf9d1c3150c40d4d6ca4b444286ac0ee52876
c60c4feb1c99b62733a8ef9a471ee715229630f0aab127635f57cced5a0a11e5
33bd90ca0e5e254019bdb27a0fdf376e696eaa322f93fe77ed6ba893d89eaaed
7848e972ca151c473f01252cb83a32eaa71512ccb5b68ec6aeda7a82c84a06b3
45508fb4980236ac92551fd48245276ac5aeef0112d64ddc116b6639393b1f0d
bf504dca842cfb1968e1476018c97f0b05e2121407eb3a9d8205a93e672116e0
d76238f3c984d862646ce2b47bc19f91ac476dd9a59b41e968fb6b2a405df1e9
c5ddf247d620f624c5c41848714d6ee6fb039765ae2a7976877729c317824b48
f3b7e1232f9b87ea965f95d2398f3c01bea0e4d42b9db1d570b4124085bdbf5c
63c8b01aa803febed5d25cbcfe9a3667cd631946388fa9675d2eab66b34d43c1
b4c9707790c95c9920da261b135cc23c1779b43e29249ca99cae0722875e4758
17a99a3c772ba9c586b477b153ec4ebbda43eec1ba06200adf0deb9a557649ba
4fc5ab9a08c4c4970a5c58374d65bbfd9a8d04d9ee818df672a8363025d68197
4ab76027996cde10a71ba11f47f75ae0d94322073aeee93fc879b0411aba9b2f
7ff217fdf93d4fb18a19fd0f0ff39a68db07e66833f15ebb9bbc25e31b131513
708ae27fefd6663bb2f0cf22f3f046c6ccac1d6a3ef54c980d9ce7e5df5e56b0
537010c63014966be3db3b382b6fd9c8970e19b3dc25161ff7de70836ab404f3
920a5f5a05856ac627ecdf633c4167d8b886a366126b26ff87d86adbbd353cdc
)
package() {
mkdir -p $pkgdir/etc/ld.so.conf.d
cat <<-EOF > $pkgdir/etc/ld.so.conf.d/rocm-bin.conf
/opt/rocm/lib
/opt/rocm/lib64
/opt/rocm/hip/lib
/opt/rocm/hiprand/lib
/opt/rocm/hsa/lib
/opt/rocm/opencl/lib/x86_64
/opt/rocm/rocrand/lib
EOF
cp -r opt $pkgdir
}
Just extracting it, adding a ld.so.conf.d config so libraries get discovered, and repackaging into something arch understands.
By the way, congrats on the release - the Docker experience on upstream kernel is truly amazing.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 16
hi @farnoy, the r1.12.0 pypl package works well on my ROCm2.1 based docker image:
The following Dockerfile should be helpful to guide you to setup the dependancies: https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/r1.12-rocm/tensorflow/tools/ci_build/Dockerfile.rocm
To build TF from source on Ubuntu, the following Docker image is readily available: rocm/tensorflow:rocm2.1-tf1.12-python3-dev
Well, perhaps a better way to put it is “build-from-source ROCr is broken” and I’ll need to go back and add a temporary patch into the scripts I’m recommending people try to follow. 😃
But yes, it’s probably pulling its version numbers from the latest tag, and our internal repo (that builds the official .rpm and .deb files) isn’t having the same issue. Anyway, libhsaruntime64.so.2 shouldn’t exist. It should still be .1