tensorflow: tf.config.LogicalDeviceConfiguration memory_limit does not really work ?

System information

  • Custom code
  • Linux Ubuntu 20.04 (tested also 18.04)
  • Computer (not mobile device)
  • TensorFlow installed from pip install:
  • TensorFlow version 2.7 (tested also 2.4,2.5,2.6,2.8)
  • Python version: 3.7 (tested also 3.8)
  • Bazel version (if compiling from source): unbuntu 20.04 no self compil
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 11.2.2/8.2.0.1
  • GPU model and memory: NVIDIA RTX3090 24GB (tested also RTX2060 anX GTX1660) Driver 460.x

Hello I successfully ran the profiler tool on ma classification model to profile the maximum memory usage. Because I want to use different CNN on a same GPU. But I’m really baffled by the results of the profiler. Let me explain

I have a NVIDIA RTX3090 with 24GB memory so for my small CNN I set 512 memory limit in my code before all use with this code :
tf.config.set_logical_device_configuration(gpus[0],[tf.config.LogicalDeviceConfiguration(memory_limit=512)])

It seems to work because of the tensorflow logs 2022-01-19 16:24:13.615890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with **512 MB memory:** -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:2d:00.0, compute capability: 8.6

Nvidia-smi shows that GPU use 419MiB are used smi first

Then I start a batch to make the inference on the classification model with batch size = 1 and tensorboard shows that the model use about 100MiB tensorboard

so theoretically I could have set a small memory limit (under 512) but … here is the real use of the memory given by Nvidia-smi is 1869MiB ! nvidiaamemoryused

I tested the code in Tensorflow 2.4,2.5,2.6,2.7 and 2.8. With different CUDA CUDNN but it is the same. Memory limit seems to be applied at Tensorflow level but not at the real GPU memory ( Nvidia level ). Did I miss something ? It would be very usefull to be able to manage the memory of a model !

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (5 by maintainers)

Most upvoted comments

@fitoule Could you please confirm if this issue still persists ? Please move this issue to closed status if it is resolved for you. Thank you!

I made further investigations. Actually the command line works but documentation is not enough clear. on my test when I set memory_limit=200. A) When I Call import tensorflow => NVIDIA memory allocated is 423MiB B) When I Call the code with memory limit => NVIDIA memory allocated is 423+200=623MiB C) When a first inference is called then TensorFlow add a C part memory 938MiB (+423+200) Total = 1561 MiB

So I understand that A+C is a constant that is needed by TensorFlow and the memory_limit affects only the B part. I tested on many different model. A+B depends on the driver or GPU HW.

So now it’s clear for me. But finally documentation could mention this because see for a low model about 100MiB I need 1.5GB ram, it 's confusing.