tensorflow: TF 2.0 XLA JIT reporting error: "./bin/ptxas not found"

System information

  • OS Platform and Distribution: Ubuntu 16.04.6 LTS
  • TensorFlow installed from (source or binary): pip3 install tensorflow-gpu
  • TensorFlow version (use command below): 2.0.0
  • Python version: 3.5.2
  • CUDA/cuDNN version: 10.0
  • GPU model and memory: TITAN Xp

Describe the current behavior

The test code is running with error as bellow:

2019-12-26 22:02:59.166382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-26 22:02:59.166422: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-26 22:02:59.166453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-12-26 22:02:59.166482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-12-26 22:02:59.166512: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-12-26 22:02:59.166541: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-12-26 22:02:59.166573: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-26 22:02:59.171144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-26 22:02:59.171311: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-26 22:02:59.174312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-26 22:02:59.174418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-12-26 22:02:59.174508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-12-26 22:02:59.179990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11439 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:06:00.0, compute capability: 6.1)
sleep
2019-12-26 22:02:59.923393: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-26 22:03:00.348503: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Not found: ./bin/ptxas not found
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-12-26 22:03:00.355159: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at xla_compile_on_demand_op.cc:218 : Not found: ./bin/ptxas not found
Traceback (most recent call last):
  File "tf.py", line 8, in <module>
    c = tf.linalg.matmul(a, b)
  File "/home/thincal/.local/lib/python3.5/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/thincal/.local/lib/python3.5/site-packages/tensorflow_core/python/ops/math_ops.py", line 2765, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/home/thincal/.local/lib/python3.5/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6126, in mat_mul
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 2, in raise_from
tensorflow.python.framework.errors_impl.NotFoundError: ./bin/ptxas not found [Op:MatMul] name: MatMul/

Describe the expected behavior

The test code is running successfully.

Code to reproduce the issue

import tensorflow as tf
try:
  with tf.device('device:XLA_GPU:0'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.linalg.matmul(a, b)
    print(c)
except RuntimeError as e:
  print(e)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 22 (10 by maintainers)

Commits related to this issue

Most upvoted comments

2019-12-30 12:52:15.338250: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at xla_compile_on_demand_op.cc:220 : Not found: ./bin/ptxas not found

Instead of looking for ./bin/ptxas, shouldn’t it check to see if ptxas is available first? In my case:

$ which ptxas
/home/username/.local/cuda-10.1/bin/ptxas

can you show how to do that?

The real ptxas path may differ on your system (just for info: I have Debian 10 installation with NVidia stuff from buster-backports repository). As far as I understand tensorflow library looks for ptxas in the ./bin directory (note that the path is relative, i.e. the current working directory where you start your python script and ln -s ... is important). The commands are: mkdir ./bin ln -s /usr/bin/ptxas ./bin/ptxas

It shouldn’t matter in which order do we look for ptxas, if it’s in your $PATH, it will be found.

@cheshire, actually it matters, I have ptxas in my $PATH, but the error gone only after I’ve created symlynk to ./bin/ptxas.

Do you want to try tf-nightly-gpu package? I’m not sure if the fix made it into 2.0.0.

@cheshire

Summary:

  • It is working with tf-nightly-gpu, but needs some fix the libcuxxx version missing issue.

FYI:

the first try: it reports missing some library so that GPU can’t be used:

tf-nightly-gpu: 2.1.0.dev20191227
cudnn: v7.5.0
cuda: v10.0
$ python3 tf.py
WARNING: Logging before flag parsing goes to stderr.
W1228 14:30:27.537421 140555689355008 tpu_cluster_resolver.py:35] Falling back to tensorflow client, its recommended to install the cloud tpu client directly with pip install cloud-tpu-client .
sleep
sleep
main thread end...
sleep
sleep
sleep
2019-12-28 14:30:34.308062: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-28 14:30:34.318080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0000:83:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2019-12-28 14:30:34.320787: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /data/cuda/cuda-10.0/cuda/lib64:/data/cuda/cuda-10.0/cudnn/v7.5.0/lib64:/usr/local/nvidia/lib64
2019-12-28 14:30:34.321892: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /data/cuda/cuda-10.0/cuda/lib64:/data/cuda/cuda-10.0/cudnn/v7.5.0/lib64:/usr/local/nvidia/lib64
2019-12-28 14:30:34.323004: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /data/cuda/cuda-10.0/cuda/lib64:/data/cuda/cuda-10.0/cudnn/v7.5.0/lib64:/usr/local/nvidia/lib64
2019-12-28 14:30:34.324252: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /data/cuda/cuda-10.0/cuda/lib64:/data/cuda/cuda-10.0/cudnn/v7.5.0/lib64:/usr/local/nvidia/lib64
2019-12-28 14:30:34.325450: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /data/cuda/cuda-10.0/cuda/lib64:/data/cuda/cuda-10.0/cudnn/v7.5.0/lib64:/usr/local/nvidia/lib64
2019-12-28 14:30:34.326818: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /data/cuda/cuda-10.0/cuda/lib64:/data/cuda/cuda-10.0/cudnn/v7.5.0/lib64:/usr/local/nvidia/lib64
2019-12-28 14:30:34.379859: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-28 14:30:34.379948: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1595] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-12-28 14:30:34.380555: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-28 14:30:34.413199: I tensorflow/core/platform/profile_utils/cpu_utils.cc:101] CPU Frequency: 2100020000 Hz
2019-12-28 14:30:34.413769: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x65f6b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-12-28 14:30:34.413829: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2019-12-28 14:30:34.417437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-28 14:30:34.417491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105]
/job:localhost/replica:0/task:0/device:XLA_GPU:0 unknown device.

the second try: I just upgrade the cudnn to v7.6.1, it reports the same error as above. But actually these shared libraries are existed but just with a version 10.0:

/data/cuda/cuda-10.0/cuda/lib64$ ls -la libcudart.so* libcublas.so* libcufft.so* libcurand.so* libcusolver.so* libcusparse.so*
lrwxrwxrwx 1 thincal thincal        17 Oct 17  2018 libcublas.so -> libcublas.so.10.0
lrwxrwxrwx 1 thincal thincal        21 Oct 17  2018 libcublas.so.10.0 -> libcublas.so.10.0.130
-r-xr-xr-x 1 thincal thincal  70796360 Oct 17  2018 libcublas.so.10.0.130
lrwxrwxrwx 1 thincal thincal        17 Oct 17  2018 libcudart.so -> libcudart.so.10.0
lrwxrwxrwx 1 thincal thincal        21 Oct 17  2018 libcudart.so.10.0 -> libcudart.so.10.0.130
-r-xr-xr-x 1 thincal thincal    495736 Oct 17  2018 libcudart.so.10.0.130
lrwxrwxrwx 1 thincal thincal        16 Oct 17  2018 libcufft.so -> libcufft.so.10.0
lrwxrwxrwx 1 thincal thincal        20 Oct 17  2018 libcufft.so.10.0 -> libcufft.so.10.0.145
-r-xr-xr-x 1 thincal thincal 103177128 Oct 17  2018 libcufft.so.10.0.145
lrwxrwxrwx 1 thincal thincal        17 Oct 17  2018 libcurand.so -> libcurand.so.10.0
lrwxrwxrwx 1 thincal thincal        21 Oct 17  2018 libcurand.so.10.0 -> libcurand.so.10.0.130
-r-xr-xr-x 1 thincal thincal  60806128 Oct 17  2018 libcurand.so.10.0.130
lrwxrwxrwx 1 thincal thincal        19 Oct 17  2018 libcusolver.so -> libcusolver.so.10.0
lrwxrwxrwx 1 thincal thincal        23 Oct 17  2018 libcusolver.so.10.0 -> libcusolver.so.10.0.130
-r-xr-xr-x 1 thincal thincal 139257368 Oct 17  2018 libcusolver.so.10.0.130
lrwxrwxrwx 1 thincal thincal        19 Oct 17  2018 libcusparse.so -> libcusparse.so.10.0
lrwxrwxrwx 1 thincal thincal        23 Oct 17  2018 libcusparse.so.10.0 -> libcusparse.so.10.0.130
-r-xr-xr-x 1 thincal thincal  59078736 Oct 17  2018 libcusparse.so.10.0.130

the third try: after making a symbolic link for above missing libraries it is running well now:

WARNING: Logging before flag parsing goes to stderr.
W1228 14:56:19.771065 140537504274176 tpu_cluster_resolver.py:35] Falling back to tensorflow client, its recommended to install the cloud tpu client directly with pip install cloud-tpu-client .
sleep
main thread end...
sleep
sleep
sleep
2019-12-28 14:56:25.386761: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-28 14:56:25.395346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0000:83:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2019-12-28 14:56:25.400400: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-12-28 14:56:25.426708: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-12-28 14:56:25.467068: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-12-28 14:56:25.533422: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-12-28 14:56:25.574739: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-12-28 14:56:25.771502: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-12-28 14:56:25.836608: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-28 14:56:25.839959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2019-12-28 14:56:25.840493: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-28 14:56:25.869541: I tensorflow/core/platform/profile_utils/cpu_utils.cc:101] CPU Frequency: 2100220000 Hz
2019-12-28 14:56:25.869935: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x69e96a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-12-28 14:56:25.869985: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2019-12-28 14:56:25.991250: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x69ec0e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2019-12-28 14:56:25.991316: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-12-28 14:56:25.992662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0000:83:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2019-12-28 14:56:25.992754: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-12-28 14:56:25.992781: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-12-28 14:56:25.992803: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-12-28 14:56:25.992826: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-12-28 14:56:25.992848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-12-28 14:56:25.992870: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-12-28 14:56:25.992893: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-28 14:56:25.995265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2019-12-28 14:56:25.995316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-12-28 14:56:25.997224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-28 14:56:25.997267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105]      0
2019-12-28 14:56:25.997321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0:   N
2019-12-28 14:56:25.999786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1)
2019-12-28 14:56:26.006701: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
sleep
2019-12-28 14:56:26.500355: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2019-12-28 14:56:26.500427: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:75] Searched for CUDA in the following directories:
2019-12-28 14:56:26.500467: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:78]   ./cuda_sdk_lib
2019-12-28 14:56:26.500496: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:78]   /usr/local/cuda
2019-12-28 14:56:26.500509: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:78]   .
2019-12-28 14:56:26.500520: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:80] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2019-12-28 14:56:26.502503: I tensorflow/compiler/jit/xla_compilation_cache.cc:242] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)