tensorflow: Not creating XLA devices, tf_xla_enable_xla_devices not set

Hi,

I have recently upgraded my system to the following configuration:

OS: ubuntu 18.04 gcc: 7.5.0 cuda: 10.2 cuDNN:7.6.5 TensorRT: 6.0.1.8 Tensorflow:2.5.0 My GPU spec: device: 0, name: GeForce GTX 1060 6GB

Once Tensorflow installation is completed, i checked the following cpde:

with tf.device('/gpu:0'): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) with tf.Session() as sess: print (sess.run(c))

When I execute it in a terminal, I find the following:

`>>> import tensorflow as tf 2020-11-08 13:00:32.053030: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2

with tf.device(‘/gpu:0’): … a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=‘a’) … b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=‘b’) … c = tf.matmul(a, b) … 2020-11-08 13:00:33.123388: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2020-11-08 13:00:33.123967: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2020-11-08 13:00:33.137540: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.137915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1724] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1 coreClock: 1.7085GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s 2020-11-08 13:00:33.137933: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2 2020-11-08 13:00:33.139254: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2020-11-08 13:00:33.139295: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10 2020-11-08 13:00:33.140475: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2020-11-08 13:00:33.140641: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2020-11-08 13:00:33.141883: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2020-11-08 13:00:33.142541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10 2020-11-08 13:00:33.145144: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7 2020-11-08 13:00:33.145247: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.145551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.145778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1866] Adding visible gpu devices: 0 2020-11-08 13:00:33.146034: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-11-08 13:00:33.146315: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2020-11-08 13:00:33.146377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.146602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1724] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1 coreClock: 1.7085GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s 2020-11-08 13:00:33.146616: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2 2020-11-08 13:00:33.146645: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10 2020-11-08 13:00:33.145247: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.145551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-11-08 13:00:33.145778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1866] Adding visible gpu devices: 0 2020-11-08 13:00:33.146034: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-11-08 13:00:33.146315: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2020-11-08 13:00:33.146377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

I would like to know how to resolve the xla_devices not set and SysFS had negative value (-1)issues.

any suggestions?

regards,

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 26 (7 by maintainers)

Links to this issue

python - Call to CreateProcess failed. Error code: 2 (TensorFlow) - Stack Overflow

Most upvoted comments

okay, thanks. I would appreciate it if you can provide some insight on these warnings though, would be helpful for me to understand. I have been looking into it as well.

regadrs,

+21

ahar on Nov 8, 2020

I ran a same model using TF 2.3 with both CUDA 10.1 and 11.1. With CUDA 10.1 without XLA warning, it was much much faster and it used my GPU more efficient than CUDA 11.1 with XLA warning. To be specific, CUDA 10.1 trained my large model (BERT) with 60% GPU usage in about 3mins for each epoch and the batch size was much larger. However, CUDA 11.1 used only 8% GPU and ran each epoch in about 10 mins (> 3X slower)

ehsanaghaei on Dec 2, 2020

@amahendrakar each time someone opens an issue related to this - it get closed.

@ydennisy, Could you please submit a new issue from this link, so that it can be tracked separately and where you’ll be the owner for it. Thanks!

amahendrakar on Feb 9, 2021

I also have the same problem. could anyone please help me with how to set up with a full explanation?

Thanks, Regards, Kuma

I also have the same problem. could anyone please help me with how to set up with a full explanation?

Thanks, Regards, Kumar

1- Go to Environment Variables from search panel. 2- You will see Local variables and System variables. 3- Click new for system variables. 4- Variables name = TF_XLA_FLAGS 5- Variables value = --tf_xla_enable_xla_devices 6- Save it and try your scripts (e.g. python -c “import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))”)

UcanYusuf on Mar 4, 2021

Can this be re-opened and some documentation be added on these flags, it is very trial and error at the moment.

ydennisy on Jan 25, 2021

To not leave anyone hanging: I had same problem with same warning and surprisingly slow performance on RTX 3090, CUDA 11.1, TF 2.5 nightly and adding the windows environment variable TF_XLA_FLAGS = --tf_xla_enable_xla_devices seems to have solved the problem.

bimewok on Jan 15, 2021

@Angit16, The Not creating XLA devices, tf_xla_enable_xla_devices not set message is an information log which you can safely ignore.

To verify that TensorFlow has detected the GPU on you machine, please run the below code and check the number of GPUs available

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Thanks!

amahendrakar on Nov 8, 2020

I’d like to know more as now I am seeing this recommended TF_XLA_FLAGS=–tf_xla_cpu_global_jit

rmaiale on Jan 25, 2021

I’m training CycleGAN network on RTX 3090 with CUDA 11.1. Training the model on subset of dataset (celebA with certain attributes) which has around 3K images. Before setting any flags each epoch took ~2399 secs. That’s quite a lot of time. Previously I used Google Colab. For some reason the GPU P100 took less time (~780-800 secs)whilst It should be T4(~1270-1300 secs). I’ve set this flag below. export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda export TF_XLA_FLAGS="--tf_xla_auto_jit=2"

Boom!! Now each epoch is taking ~410-415secs. Crazy fast!!

Arvindia on Jan 15, 2021

RELEASE.md reads:

XLA:CPU and XLA:GPU devices are no longer registered by default. Use TF_XLA_FLAGS=–tf_xla_enable_xla_devices if you really need them (to be removed).

vmarkovtsev on Nov 17, 2020