tensorflow: Not creating XLA devices, tf_xla_enable_xla_devices not set
Hi,
I have recently upgraded my system to the following configuration:
OS: ubuntu 18.04 gcc: 7.5.0 cuda: 10.2 cuDNN:7.6.5 TensorRT: 6.0.1.8 Tensorflow:2.5.0 My GPU spec: device: 0, name: GeForce GTX 1060 6GB
Once Tensorflow installation is completed, i checked the following cpde:
with tf.device('/gpu:0'): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) with tf.Session() as sess: print (sess.run(c))
When I execute it in a terminal, I find the following:
`>>> import tensorflow as tf 2020-11-08 13:00:32.053030: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
with tf.device(‘/gpu:0’): … a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=‘a’) … b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=‘b’) … c = tf.matmul(a, b) … 2020-11-08 13:00:33.123388: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2020-11-08 13:00:33.123967: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2020-11-08 13:00:33.137540: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.137915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1724] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1 coreClock: 1.7085GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s 2020-11-08 13:00:33.137933: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2 2020-11-08 13:00:33.139254: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2020-11-08 13:00:33.139295: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10 2020-11-08 13:00:33.140475: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2020-11-08 13:00:33.140641: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2020-11-08 13:00:33.141883: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2020-11-08 13:00:33.142541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10 2020-11-08 13:00:33.145144: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2020-11-08 13:00:33.145247: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.145551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.145778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1866] Adding visible gpu devices: 0 2020-11-08 13:00:33.146034: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-11-08 13:00:33.146315: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2020-11-08 13:00:33.146377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.146602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1724] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1 coreClock: 1.7085GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s 2020-11-08 13:00:33.146616: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2 2020-11-08 13:00:33.146645: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2020-11-08 13:00:33.145247: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.145551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.145778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1866] Adding visible gpu devices: 0 2020-11-08 13:00:33.146034: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-11-08 13:00:33.146315: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2020-11-08 13:00:33.146377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
`
I would like to know how to resolve the xla_devices not set and SysFS had negative value (-1)issues.
any suggestions?
regards,
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 26 (7 by maintainers)
okay, thanks. I would appreciate it if you can provide some insight on these warnings though, would be helpful for me to understand. I have been looking into it as well.
regadrs,
I ran a same model using TF 2.3 with both CUDA 10.1 and 11.1. With CUDA 10.1 without XLA warning, it was much much faster and it used my GPU more efficient than CUDA 11.1 with XLA warning. To be specific, CUDA 10.1 trained my large model (BERT) with 60% GPU usage in about 3mins for each epoch and the batch size was much larger. However, CUDA 11.1 used only 8% GPU and ran each epoch in about 10 mins (> 3X slower)
@ydennisy, Could you please submit a new issue from this link, so that it can be tracked separately and where you’ll be the owner for it. Thanks!
1- Go to Environment Variables from search panel. 2- You will see Local variables and System variables. 3- Click new for system variables. 4- Variables name = TF_XLA_FLAGS 5- Variables value = --tf_xla_enable_xla_devices 6- Save it and try your scripts (e.g. python -c “import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))”)
Can this be re-opened and some documentation be added on these flags, it is very trial and error at the moment.
To not leave anyone hanging: I had same problem with same warning and surprisingly slow performance on RTX 3090, CUDA 11.1, TF 2.5 nightly and adding the windows environment variable TF_XLA_FLAGS = --tf_xla_enable_xla_devices seems to have solved the problem.
@Angit16, The
Not creating XLA devices, tf_xla_enable_xla_devices not setmessage is an information log which you can safely ignore.To verify that TensorFlow has detected the GPU on you machine, please run the below code and check the number of GPUs available
Thanks!
I’d like to know more as now I am seeing this recommended TF_XLA_FLAGS=–tf_xla_cpu_global_jit
I’m training CycleGAN network on RTX 3090 with CUDA 11.1. Training the model on subset of dataset (celebA with certain attributes) which has around 3K images. Before setting any flags each epoch took ~2399 secs. That’s quite a lot of time. Previously I used Google Colab. For some reason the GPU P100 took less time (~780-800 secs)whilst It should be T4(~1270-1300 secs). I’ve set this flag below.
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cudaexport TF_XLA_FLAGS="--tf_xla_auto_jit=2"Boom!! Now each epoch is taking ~410-415secs. Crazy fast!!
RELEASE.md reads: