tensorflow: TensorFlow 1st Test: “could not open file to read NUMA node” - what's wrong?

I went to StackOverflow with this and was pointed back to Github. 😉 see [http://stackoverflow.com/questions/37067297]

Environment info

Operating System: Gentoo Linux on Lenovo P50

Installed version of CUDA and cuDNN: I installed dev-util/nvidia-cuda-toolkit package, version 7.5.18-r2

# ll /opt/cuda/lib/libcud*
-rw-r--r-- 1 root root 189082 May  6 10:42 /opt/cuda/lib/libcudadevrt.a
lrwxrwxrwx 1 root root     16 Sep 19  2015 /opt/cuda/lib/libcudart.so -> libcudart.so.7.5
lrwxrwxrwx 1 root root     19 Sep 19  2015 /opt/cuda/lib/libcudart.so.7.5 -> libcudart.so.7.5.18
-rwxr-xr-x 1 root root 311596 Sep 19  2015 /opt/cuda/lib/libcudart.so.7.5.18
-rw-r--r-- 1 root root 557240 May  6 10:42 /opt/cuda/lib/libcudart_static.a

Plus I installed cuDNN 5 downloaded from Nvidia

# ll libcud*
lrwxrwxrwx 1 rj rj       13 Mar 22 08:44 libcudnn.so -> libcudnn.so.5
lrwxrwxrwx 1 rj rj       17 Mar 22 08:44 libcudnn.so.5 -> libcudnn.so.5.0.4
-rwxrwxr-x 1 rj rj 59823168 Mar 22 02:37 libcudnn.so.5.0.4
-rw-rw-r-- 1 rj rj 58734618 Mar 22 02:37 libcudnn_static.a

If installed from binary pip package, provide:

Which pip package you installed.

# pip3 -V
pip 8.1.1 from /usr/lib64/python3.4/site-packages (python 3.4)

The output from python -c “import tensorflow; print(tensorflow.version)”.

# python -c "import tensorflow; print(tensorflow.__version__)"
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
0.8.0

Steps to reproduce

  1. import tensorflow as tf
  2. hello = tf.constant(‘Hello, TensorFlow!’)
  3. sess = tf.Session()

What have you tried?

  1. stackoverflow 😉

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 15 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Hi all, I am stuck in the same error. I am working with tensorflow. If you found any solution, kindly share. Thanks

successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

Hm. I’m afraid the path template “/sys/bus/pci/devices/%s/numa_node” cannot be found on my system as /sys/bus/pci/devices looks like that:

# ll /sys/bus/pci/devices/0000\:*
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:01.0 -> ../../../devices/pci0000:00/0000:00:01.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:14.0 -> ../../../devices/pci0000:00/0000:00:14.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:14.2 -> ../../../devices/pci0000:00/0000:00:14.2
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:16.0 -> ../../../devices/pci0000:00/0000:00:16.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:16.3 -> ../../../devices/pci0000:00/0000:00:16.3
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:17.0 -> ../../../devices/pci0000:00/0000:00:17.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1c.0 -> ../../../devices/pci0000:00/0000:00:1c.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1c.2 -> ../../../devices/pci0000:00/0000:00:1c.2
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1c.4 -> ../../../devices/pci0000:00/0000:00:1c.4
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1d.0 -> ../../../devices/pci0000:00/0000:00:1d.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1d.4 -> ../../../devices/pci0000:00/0000:00:1d.4
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1f.0 -> ../../../devices/pci0000:00/0000:00:1f.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1f.2 -> ../../../devices/pci0000:00/0000:00:1f.2
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1f.3 -> ../../../devices/pci0000:00/0000:00:1f.3
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1f.4 -> ../../../devices/pci0000:00/0000:00:1f.4
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:00:1f.6 -> ../../../devices/pci0000:00/0000:00:1f.6
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:01:00.0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:01:00.1 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:04:00.0 -> ../../../devices/pci0000:00/0000:00:1c.2/0000:04:00.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:3e:00.0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:3e:00.0
lrwxrwxrwx 1 root root 0 May 11 15:06 /sys/bus/pci/devices/0000:3f:00.0 -> ../../../devices/pci0000:00/0000:00:1d.4/0000:3f:00.0

ls /sys/bus/pci/devices/*/numa_node

no such file or directory

Which it does - of course, because the kernel has no NUMA support. 😃 So I compiled a new 4.5.3 kernel with NUMA support and …

>>> sess = tf.Session()
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: Quadro M2000M
major: 5 minor: 0 memoryClockRate (GHz) 1.137
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.47GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0)

So it seems my problem was simply due to the fact, that the kernel had no NUMA support. I will now switch back to Hybrid mode (activate the Intel GPU) and test if that works with NUMA enabled.

edit: it does. So the “fix” could be some warning sign somewhere “NUMA support is a must”.

Officially you need a GPU with compute level >= 3.5, but see https://github.com/tensorflow/tensorflow/issues/25

On Thu, Aug 18, 2016 at 9:30 AM, abhijayghildyal notifications@github.com wrote:

When I ran

bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu

it gave me the following error:

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:118] Found device 0 with properties: name: GeForce GT 640 major: 3 minor: 0 memoryClockRate (GHz) 0.9015 pciBusID 0000:05:00.0 Total memory: 2.00GiB Free memory: 1.98GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:138] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:148] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. I tensorflow/core/common_runtime/gpu/gpu_device.cc:843] Ignoring gpu device (device: 0, name: GeForce GT 640, pci bus id: 0000:05:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. F tensorflow/cc/tutorials/example_trainer.cc:128] Check failed: ::tensorflow::Status::OK() == (session->Run({{“x”, x}}, {“y:0”, “y_normalized:0”}, {}, &outputs)) (OK vs. Invalid argument: Cannot assign a device to node ‘Cast’: Could not satisfy explicit device specification ‘/gpu:0’ because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0 [[Node: Cast = CastDstT=DT_FLOAT, SrcT=DT_INT32, _device=“/gpu:0”]]) F tensorflow/cc/tutorials/example_trainer.cc:128] Check failed: ::tensorflow::Status::OK() == (session->Run({{“x”, x}}, {“y:0”, “y_normalized:0”}, {}, &outputs)) (OK vs. Invalid argument: Cannot assign a device to node ‘Cast’: Could not satisfy explicit device specification ‘/gpu:0’ because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0 [[Node: Cast = CastDstT=DT_FLOAT, SrcT=DT_INT32, _device=“/gpu:0”]]) F tensorflow/cc/tutorials/example_trainer.cc:128] Check failed: ::tensorflow::Status::OK() == (session->Run({{“x”, x}}, {“y:0”, “y_normalized:0”}, {}, &outputs)) (OK vs. Invalid argument: Cannot assign a device to node ‘Cast’: Could not satisfy explicit device specification ‘/gpu:0’ because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0 [[Node: Cast = CastDstT=DT_FLOAT, SrcT=DT_INT32, _device=“/gpu:0”]]) F tensorflow/cc/tutorials/example_trainer.cc:128] Check failed: ::tensorflow::Status::OK() == (session->Run({{“x”, x}}, {“y:0”, “y_normalized:0”}, {}, &outputs)) (OK vs. Invalid argument: Cannot assign a device to node ‘Cast’: Could not satisfy explicit device specification ‘/gpu:0’ because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0 [[Node: Cast = CastDstT=DT_FLOAT, SrcT=DT_INT32, _device=“/gpu:0”]]) Aborted (core dumped)

Will I need a different gpu to run this?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/2264#issuecomment-240779046, or mute the thread https://github.com/notifications/unsubscribe-auth/AO818Z6uyOczcrz6c1FBY8kKGzgKe4cVks5qhIiXgaJpZM4IZbcS .