tensorflow: CUDA_ERROR_OUT_OF_MEMORY with tf.contrib.learn (basic linear regression model)

Environment info

Operating System: Ubuntu 16.04, GeForce GTX TITAN X

Installed version of CUDA and cuDNN: (please attach the output of ls -l /path/to/cuda/lib/libcud*): ls /usr/local/cuda-8.0/lib64/libcud* /usr/local/cuda-8.0/lib64/libcudadevrt.a /usr/local/cuda-8.0/lib64/libcudart.so /usr/local/cuda-8.0/lib64/libcudart.so.8.0 /usr/local/cuda-8.0/lib64/libcudart.so.8.0.44 /usr/local/cuda-8.0/lib64/libcudart_static.a /usr/local/cuda-8.0/lib64/libcudnn.so /usr/local/cuda-8.0/lib64/libcudnn.so.5 /usr/local/cuda-8.0/lib64/libcudnn.so.5.1.10 /usr/local/cuda-8.0/lib64/libcudnn.so.5.1.5 /usr/local/cuda-8.0/lib64/libcudnn_static.a

If installed from binary pip package, provide:

  1. Installed with pip tensorflow-gpu
  2. ~ python -c "import tensorflow; print(tensorflow.version) 1.0.0

If possible, provide a minimal reproducible example (We usually don’t have time to read hundreds of lines of your code)

Using the provided tf.contrib.learn model in get_started/get_started documentation

import tensorflow as tf
import numpy as np

features = [tf.contrib.layers.real_valued_column("x", dimension=1)]
estimator = tf.contrib.learn.LinearRegressor(feature_columns=features)
x = np.array([1., 2., 3., 4.])
y = np.array([0., -1., -2., -3.])
input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x}, y, batch_size=4, num_epochs=1000)
estimator.fit(input_fn=input_fn, steps=10)

The Core model works fine.

Logs or other output that would be helpful

(If logs are large, please upload as attachment or provide link).

python test-oom.py 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpew3U6z
WARNING:tensorflow:Rank of input Tensor (1) should be the same as output_rank (2) for column. Will attempt to expand dims. It is highly recommended that you resize your input, as this behavior may change.
WARNING:tensorflow:From /home/gajop/projekti/ubuntu-ranking-dataset-creator/env/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/head.py:1362: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:01:00.0
Total memory: 11.92GiB
Free memory: 11.52GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 11.92G (12799180800 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 4
  • Comments: 19 (9 by maintainers)

Most upvoted comments

In tensorflow\contrib\learn\python\learn\estimators\run_config.py you can change the GPUOption. I changed in def __init__ per_process_gpu_memory_fraction to 0.7 .

I have the same issue when trying to run the basic usage program from https://www.tensorflow.org/get_started/get_started using the version 1.0.1. TensorFlow tries to allocate the total memory:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate (GHz) 1.176 pciBusID 0000:01:00.0 Total memory: 3.95GiB Free memory: 3.57GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0) E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.95G (4240965632 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Any news on fixing this?

By the way, the program also generates some warnings:

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp8_ec6rtc WARNING:tensorflow:Rank of input Tensor (1) should be the same as output_rank (2) for column. Will attempt to expand dims. It is highly recommended that you resize your input, as this behavior may change. WARNING:tensorflow:From /home/trd/apps/miniconda/envs/mt/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/head.py:1362: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30. Instructions for updating: Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.

The one from head.py about deprecation and removal after 2016-11-30 looks extremely weird to me.