autokeras: Out of memory error with NVIDIA K80 GPU
Trying to create an image classifier with ~1000 training samples and 7 classes but it throws a runtime error. Is there a way of reducing batch size or something else that can be done to circumvent this?
Following is the error.
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58/usr/lib/python3.5/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 2 leaked semaphores to clean up at shutdown len(cache))
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 1
- Comments: 23 (7 by maintainers)
When I first ran this with about 550 128x128 grayscale images using a Quadro P4000 with 8 GB of memory, it immediately crashed due to insufficient memory. I adjusted the constant.MAX_BATCH_SIZE parameter from the default of 128 down to 32, and then it worked for about an hour until crashing again. The error message was: RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
I was watching the GPU memory usage before it crashed, and it fluctuated in cycles as expected for a “grid search” sort of activity. Unfortunately, it looks like the peak memory usage corresponding to the more memory-intensive models progressively increase until overwhelming the GPU memory.
Maybe it would be good, upon initialization of the program, to quantify the available memory and then cap the model search to models that fit within that limit. If the program determines that it cannot identify an optimal model within that constraint, and may require more memory, it could output such a message and hints as to how to accomplish this (i.e., smaller batches, smaller images, larger GPU memory, etc…). It might also help to offer a grayscale option in the load_image_dataset method that reduces a color image from three color channels to one grayscale channel.
also, what is the LIMIT_MEMORY parameter?
This issue is fixed in the new release. Thank you all for the contribution.
Sorry for my late reply. This is the shape of x_train.shape (1348, 480, 640, 4) and x_test.shape (1348, 480, 640, 4)
AutoKeras is poorly maintained at the minute, I had a similar issue
In “/home/maybe/anaconda3/envs/asr/lib/python3.6/site-packages/torch/nn/modules/conv.py” line 301 explicitly cast values in the tuple with the name ‘self.padding’ as int (before calling the function F.conv2d with appropiate parameters), one way you could do this by is adding the line:
before line 301: