tensorflow: tcmalloc: large alloc on Colab and Tensorflow killed on local machine due to over consumption of RAM
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
- TensorFlow installed from (source or binary): Conda
- TensorFlow version (use command below): tensorflow-gpu version 1.9.0
- Python version: 3.6
- Bazel version (if compiling from source): N/A
- GCC/Compiler version (if compiling from source): 7.4.0
- CUDA/cuDNN version: V10.1.243
- GPU model and memory: Quadro RTX 5000; and 16 GB RAM
Describe the current behavior
The tensorflow API always tries to consume the maximum RAM even when I have a GPU and the kernel gets killed while training my deep learning algorithm. I referred online on multiple sources (1, 2, 3, 4, 5, 6) and tried the following things :
- Reduce the batch size
- Change the optimizer from adam to momentum
However, none of these suggestions helped to solve the problem.
Describe the expected behavior
Be able to train without over consumption of memory and not cause the tensorflow kernel to get killed
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
I ran the following code in an ipython notebook in both my local machine (local GPU) and Google Colab :
!git clone https://github.com/charlesq34/pointnet.git
cd pointnet/sem_seg/
!sh download_data.sh
!python train.py --log_dir log6 --test_area 6
Other info / logs
The error log is very long and hence I am attaching it in a separate text file here : ERROR_LOG.txt
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 19 (3 by maintainers)
@gowthamkpr i faced the same issue . tensorflow-gpu==1.15.0 keras==2.2.4 in colab
same here. torch==1.4.0 in colab