tensorflow: tcmalloc: large alloc on Colab and Tensorflow killed on local machine due to over consumption of RAM

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
TensorFlow installed from (source or binary): Conda
TensorFlow version (use command below): tensorflow-gpu version 1.9.0
Python version: 3.6
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): 7.4.0
CUDA/cuDNN version: V10.1.243
GPU model and memory: Quadro RTX 5000; and 16 GB RAM

Describe the current behavior

The tensorflow API always tries to consume the maximum RAM even when I have a GPU and the kernel gets killed while training my deep learning algorithm. I referred online on multiple sources (1, 2, 3, 4, 5, 6) and tried the following things :

Reduce the batch size
Change the optimizer from adam to momentum

However, none of these suggestions helped to solve the problem.

Describe the expected behavior

Be able to train without over consumption of memory and not cause the tensorflow kernel to get killed

Code to reproduce the issue

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

I ran the following code in an ipython notebook in both my local machine (local GPU) and Google Colab :

!git clone https://github.com/charlesq34/pointnet.git
cd pointnet/sem_seg/
!sh download_data.sh
!python train.py --log_dir log6 --test_area 6

Other info / logs

The error log is very long and hence I am attaching it in a separate text file here : ERROR_LOG.txt

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 19 (3 by maintainers)

Most upvoted comments

@gowthamkpr i faced the same issue . tensorflow-gpu==1.15.0 keras==2.2.4 in colab

ucalyptus on Mar 19, 2020

same here. torch==1.4.0 in colab

viva2202 on Jun 13, 2020