tensorflow: Can't run TensorFlow on CPU, defaults to GPU

Environment info

Operating System:

Ubuntu 16.04

Installed version of CUDA and cuDNN: (please attach the output of ls -l /path/to/cuda/lib/libcud*):

-rw-r--r--   1 root root    322936 Eyl 19  2015 libcudadevrt.a
lrwxrwxrwx   1 root root        16 Mar 30 15:25 libcudart.so -> libcudart.so.7.5
lrwxrwxrwx   1 root root        19 Mar 30 15:25 libcudart.so.7.5 -> libcudart.so.7.5.18
-rw-r--r--   1 root root    383336 Eyl 19  2015 libcudart.so.7.5.18
-rw-r--r--   1 root root    720192 Eyl 19  2015 libcudart_static.a
lrwxrwxrwx   1 root root        12 Nis 14 18:53 libcuda.so -> libcuda.so.1
lrwxrwxrwx   1 root root        17 Nis 14 18:53 libcuda.so.1 -> libcuda.so.361.42
-rw-r--r--   1 root root  16881416 Mar 23 02:42 libcuda.so.361.42
-rwxr-xr-x   1 root root  61453024 Nis 30 11:36 libcudnn.so
-rwxr-xr-x   1 root root  61453024 Nis 30 11:36 libcudnn.so.4
-rwxr-xr-x   1 root root  61453024 Nis 30 11:36 libcudnn.so.4.0.7
-rwxr-xr-x   1 root root  59823168 Nis 30 11:12 libcudnn.so.5
-rwxr-xr-x   1 root root  59823168 Nis 30 11:12 libcudnn.so.5.0.4
-rw-r--r--   1 root root  62025862 Nis 30 11:36 libcudnn_static.a

If installed from binary pip package, provide:

  1. Which pip package you installed. Installed using the command
sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl
  1. The output from python -c “import tensorflow; print(tensorflow.version)”.
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
0.8.0

If installed from sources, provide the commit hash:

Steps to reproduce

>>> import tensorflow as tf
>>> 
>>> with tf.Session() as sess:
...     with tf.device('/cpu:0'):
...             matrix1 = tf.constant([[3., 3.]])
...             matrix2 = tf.constant([[2.],[2.]])
...             product = tf.matmul(matrix1, matrix2)
...             result = sess.run(product)
...             print(result)
... 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.291
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 5.48GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
[[ 12.]]

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 18 (7 by maintainers)

Most upvoted comments

I’m on Windows and CUDA_VISIBLE_DEVICES= or os.environ['CUDA_VISIBLE_DEVICES'] = '' still didn’t work for me, GPU devices were still created, even if not used. However setting os.environ['CUDA_VISIBLE_DEVICES'] = '-1' did work for me, only giving me a warning (seems to be a kind of recommended way of masking devices according to the docs).

Could do something like this to see placement, I bet your ops are still on CPU.

Also, to remove GPU from consideration completely, run export CUDA_VISIBLE_DEVICES=

  config = tf.ConfigProto(log_device_placement=True)
  config.gpu_options.per_process_gpu_memory_fraction=0.3 # don't hog all vRAM
  config.operation_timeout_in_ms=50000   # terminate on long hangs
  sess = tf.InteractiveSession("", config=config)

I found this code by Franck Dernoncourt from his question in Stack Overflow:

session_conf = tf.ConfigProto(
    device_count={'CPU' : 1, 'GPU' : 0},
    allow_soft_placement=True,
    log_device_placement=False
)

Then use that ConfigProto in tf.Session, that is:

with tf.Session(config=session_conf) as sess:
    # run commands
    ...

I did the above code in a simple implementation:

import tensorflow as tf

a = tf.constant([x for x in range(0, 3)], dtype=tf.float32, shape=[2, 3], name='a')
b = tf.constant([x for x in range(3, 6)], dtype=tf.float32, shape=[3, 2], name='b')
c = tf.matmul(a, b)

session_conf = tf.ConfigProto(
    device_count={'CPU' : 1, 'GPU' : 0},
    allow_soft_placement=True,
    log_device_placement=False
)

with tf.Session(config=session_conf) as sess:
    print(sess.run(c))

Then I had the following result:

2017-07-26 14:34:42.208873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:02:00.0)
Device mapping: no known devices.
2017-07-26 14:34:42.208980: I tensorflow/core/common_runtime/direct_session.cc:265] Device mapping:

MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0
2017-07-26 14:34:42.210952: I tensorflow/core/common_runtime/simple_placer.cc:847] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0
b: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-07-26 14:34:42.211003: I tensorflow/core/common_runtime/simple_placer.cc:847] b: (Const)/job:localhost/replica:0/task:0/cpu:0
a: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-07-26 14:34:42.211029: I tensorflow/core/common_runtime/simple_placer.cc:847] a: (Const)/job:localhost/replica:0/task:0/cpu:0
[[ 15.  15.]
 [ 26.  28.]]

As you can see, it did map the computation to cpu:0.

Yep ops are on cpu. Set the visible gpu devices to blank, observed correct behaviour, and got the correct final result of the matmult.

(tensorflow)username@pcname:~$ export CUDA_VISIBLE_DEVICES=
(tensorflow)username@pcname:~$ python
Python 2.7.11 |Continuum Analytics, Inc.| (default, Dec  6 2015, 18:08:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
>>> 
>>> with tf.Session() as sess:
...     with tf.device('/cpu:0'):
...             matrix1 = tf.constant([[3., 3.]])
...             matrix2 = tf.constant([[2.],[2.]])
...             product = tf.matmul(matrix1, matrix2)
...             result = sess.run(product)
...             print(result)
... 
E tensorflow/stream_executor/cuda/cuda_driver.cc:481] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:114] retrieving CUDA diagnostic information for host: pcname
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:121] hostname: pcname
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:146] libcuda reported version is: 361.42
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:257] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  361.42  Tue Mar 22 18:10:58 PDT 2016
GCC version:  gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) 
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:150] kernel reported version is: 361.42
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:226] kernel version seems to match DSO: 361.42
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
[[ 12.]]
>>> 

Just as a note, pinning the ops to cpu:0 may not prevent tensorflow from initializing the GPU device.

Is there still an issue here? Closing for now.

Worked for me os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

This environment variable is pretty universal - it can be used to block out access to all GPUs completely in many ML frameworks, not restricted to DNN and Tensorflow. I’ve verified in a docker container with several DNN frameworks compiled with CUDA 9.0 support, that setting:

ENV CUDA_VISIBLE_DEVICES=-1

at the very end of the Dockerfile blocked out GPU access completely (with graceful error handling). The list where this GPU block is verified to work includes:

  • Chainer
  • CNTK
  • Keras with Tensorflow backend
  • Lasagne
  • MXNET
  • Tensorflow
  • Theano
  • PyTorch.

Some of these, including Keras + Tensorflow will also fall back to CPU to complete the calculations.

I made a few changes to a utility by @yaroslavvb to handle this nicely as well as making it easy to grab a specific number of GPUs. The previous comment reminded me so thought I may as well share it here https://gist.github.com/ed-alertedh/58d3eb96cf1ba70542b657471dd377ca

Unix shell alias:

alias nogpu='export CUDA_VISIBLE_DEVICES=-1;'

Usage:

nogpu python your_tensorflow_script.py

When I set CUDA_VISIBLE_DEVICES=-1 on my machine with a single GPU I get the following error:

2017-07-13 16:13:00.617400: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE
2017-07-13 16:13:00.617475: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: eggplant-ed-ubuntu
2017-07-13 16:13:00.617495: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: eggplant-ed-ubuntu
2017-07-13 16:13:00.617567: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 381.22.0
2017-07-13 16:13:00.617599: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  381.22  Thu May  4 00:55:03 PDT 2017
GCC version:  gcc version 6.3.0 20170406 (Ubuntu 6.3.0-12ubuntu2) 
"""
2017-07-13 16:13:00.617613: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 381.22.0
2017-07-13 16:13:00.617618: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 381.22.0

Seems like this is an edge case that should ideally be handled (unless there is some way to enumerate CUDA devices before calling cuInit and avoid calling it at all). Interestingly the docs for cuInit don’t mention CUDA_ERROR_NO_DEVICE as a valid return code so it’s probably not surprising that this is how TF behaves.