tensorflow: TensorFlow doesn't use all available memory on NVIDIA GPU

Click to expand!

Issue Type

Bug

Source

binary

Tensorflow Version

2.6

Custom Code

No

OS Platform and Distribution

Windows 11 Pro 21H2

Mobile device

No response

Python version

3.7.13

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

11.3.1/8.2.1

GPU model and memory

NVIDIA GeForce RTX 3070 Laptop GPU 8GB

Current Behaviour?

When using GPU VRAM, TensorFlow only uses 5.6GB of the 8.0GB available

Standalone code to reproduce the issue

import tensorflow as tf
v = tf.Variable(tf.linspace(-10., 10, 10000))

Relevant log output

2022-07-02 17:25:40.770168: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-02 17:25:41.122377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5482 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6

Sat Jul  2 17:26:11 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 516.59       Driver Version: 516.59       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   52C    P8    12W /  N/A |   5740MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     27356      C   ...onda3\envs\tf2\python.exe    N/A      |
+-----------------------------------------------------------------------------+

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (3 by maintainers)

Most upvoted comments

Try setting memory_limit when initializing a logical device. In my case I have an RTX 3060 6GB and I had the same problem with tensorflow==2.9.1, so this worked:

gpus = tf.config.list_physical_devices('GPU')
if gpus: 
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=5292)]
    )

logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")

Here is before setting memory_limit:

Screenshot from 2022-08-12 16-26-25

nvidia-smi output

Screenshot from 2022-08-12 16-36-28

and after setting memry_limit to 5292MiB:

Screenshot from 2022-08-12 16-25-38

nvidia-smi output

Screenshot from 2022-08-12 16-37-25

The difference of memory between Tensorflow report and nvidia-smi probably has to do with other initialization that tensorflow has to do.

If you try to set memory limit to a value close to your VRAM size (in my case is 6144MiB) you will get an InternalError: Graph execution error when you run model.fit() later, for me I could set memory_limit to 5844 leaving only 300~MiB for other processes and my model trains with no error.