tensorflow: Failed to build on Cuda-11.1
Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 18.04
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: related to servers
- TensorFlow installed from (source or binary): source
- TensorFlow version: commit: a0b68d1ecc46f9bbc8fad4f18c68f25c6bd5ae48
- Python version: 3.6
- Installed using virtualenv? pip? conda?: none
- Bazel version (if compiling from source): 3.1.0
- GCC/Compiler version (if compiling from source): 7.5.0-3ubuntu1
- CUDA/cuDNN version: CUDA-11.1, CuDNN 8.0.3 for CUDA-11.0
- GPU model and memory: RTX 3090, 24GB, RTX Titan, 24GB, RTX 2080Ti, 11GB (total three GPUs)
Describe the problem Hello, failed to compile Tensorflow. Please check the log messages then give me an advice. Thanks!
Provide the exact sequence of commands / steps that you executed before running into the problem (Including logging messages) bazel build --config=mkl //tensorflow/tools/pip_package:build_pip_package
Any other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
tensorflow$ deactivate
sephiroce@bike:~/open_source/tensorflow$ bazel build --config=mkl //tensorflow/tools/pip_package:build_pip_package
WARNING: Ignoring JAVA_HOME, because it must point to a JDK, not a JRE.
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=228
INFO: Reading rc options for 'build' from /home/sephiroce/open_source/tensorflow/.bazelrc:
Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/sephiroce/open_source/tensorflow/.bazelrc:
'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --java_toolchain=//third_party/toolchains/java:tf_java_toolchain --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --config=short_logs --config=v2
INFO: Reading rc options for 'build' from /home/sephiroce/open_source/tensorflow/.tf_configure.bazelrc:
'build' options: --action_env PYTHON_BIN_PATH=/home/sephiroce/virtualenv/py3-tf2-gpu/bin/python3 --action_env PYTHON_LIB_PATH=/home/sephiroce/virtualenv/py3-tf2-gpu/lib/python3.6/site-packages --python_path=/home/sephiroce/virtualenv/py3-tf2-gpu/bin/python3 --config=xla --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-11.1 --action_env TF_CUDA_COMPUTE_CAPABILITIES=8.6 --action_env LD_LIBRARY_PATH=/home/sephiroce/open_source/rdkit/build/lib:/usr/local/cuda/lib64:/usr/local/lib:/usr/local/lib/openmpi:/home/sephiroce/local/lib:/usr/lib/x86_64-linux-gnu:/usr/lib: --action_env GCC_HOST_COMPILER_PATH=/usr/bin/x86_64-linux-gnu-gcc-7 --config=cuda --action_env TF_CONFIGURE_IOS=0
INFO: Found applicable config definition build:short_logs in file /home/sephiroce/open_source/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/sephiroce/open_source/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:xla in file /home/sephiroce/open_source/tensorflow/.bazelrc: --define=with_xla_support=true
INFO: Found applicable config definition build:cuda in file /home/sephiroce/open_source/tensorflow/.bazelrc: --config=using_cuda --define=using_cuda_nvcc=true
INFO: Found applicable config definition build:using_cuda in file /home/sephiroce/open_source/tensorflow/.bazelrc: --define=using_cuda=true --action_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --define=tensorflow_enable_mlir_generated_gpu_kernels=1
INFO: Found applicable config definition build:mkl in file /home/sephiroce/open_source/tensorflow/.bazelrc: --define=build_with_mkl=true --define=enable_mkl=true --define=tensorflow_mkldnn_contraction_kernel=0 -c opt
INFO: Found applicable config definition build:linux in file /home/sephiroce/open_source/tensorflow/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels
INFO: Found applicable config definition build:dynamic_kernels in file /home/sephiroce/open_source/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
INFO: Repository local_config_cuda instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule cuda_configure defined at:
/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl:1407:18: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_cuda':
Traceback (most recent call last):
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 1377
_create_local_cuda_repository(<1 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 1054, in _create_local_cuda_repository
_find_libs(repository_ctx, <2 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 599, in _find_libs
_check_cuda_libs(repository_ctx, <2 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 501, in _check_cuda_libs
execute(repository_ctx, <1 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
fail(<1 more arguments>)
Repository command failed
No library found under: /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.1
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 1377
_create_local_cuda_repository(<1 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 1054, in _create_local_cuda_repository
_find_libs(repository_ctx, <2 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 599, in _find_libs
_check_cuda_libs(repository_ctx, <2 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 501, in _check_cuda_libs
execute(repository_ctx, <1 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
fail(<1 more arguments>)
Repository command failed
No library found under: /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.1
WARNING: Target pattern parsing failed.
ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 1377
_create_local_cuda_repository(<1 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 1054, in _create_local_cuda_repository
_find_libs(repository_ctx, <2 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 599, in _find_libs
_check_cuda_libs(repository_ctx, <2 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/gpus/cuda_configure.bzl", line 501, in _check_cuda_libs
execute(repository_ctx, <1 more arguments>)
File "/home/sephiroce/open_source/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
fail(<1 more arguments>)
Repository command failed
No library found under: /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.1
INFO: Elapsed time: 1.183s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
currently loading: tensorflow/tools/pip_package
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 28 (1 by maintainers)
My graphics card: 3080 Cuda 11.1, ubuntu 20.04, cudnn 8.0.3 (the very recent one), python 3.8
I installed cuda 11.1 using the .deb file (to create all the symlinks automatically) I think the problem is that CUDA11.1 installs symlink libcudart 11.0 but it does not create symlink libcudart11.1. However, tensorflow tries to find libcudart 11.1, because cuda version is 11.1 I tried to make a symlink libcudart11.1 -> libcudart11.1.74 as suggested in here : https://github.com/tensorflow/tensorflow/issues/26150#issuecomment-469058265 However, then the error showed that SONAME did not match. Next, I tried this method : https://github.com/tensorflow/tensorflow/issues/26289#issuecomment-477848947 I opened third_party/gpus/cuda_configure.bzl and searched for ‘cudart’ In the dictionary check_cuda_libs_params, I changed cuda_config.cuda_version of “cudart” and “cudart_static” to “11.0” For me, changing cuda_config.cuda_version of only cudart and cudart_static worked and I am currently building tensorflow. My CPU is too slow (never expected to build tensorflow myself), so it will take a long time to see if it successfully builds, but until now, it is building without any errors. I will update if anything happens.
How to change the value of cuda_config.cuda_version to 11.0 cudart": _check_cuda_lib_params( “cudart”, cpu_value, cuda_config.config[“cuda_library_dir”], cuda_config.cuda_version, <-------------------- here static = False, ), “cudart_static”: _check_cuda_lib_params( “cudart_static”, cpu_value, cuda_config.config[“cuda_library_dir”], cuda_config.cuda_version, <--------------------------- and here static = True, ), Please, I am not too good at python…