tensorflow: Resource exhausted: MemoryError: Unable to allocate

Hi, I’m getting a ResourceExhaustedError in the middle of a training and I’m assuming that shouldn’t be possible. I.e., either it happens in the very first iteration, showing me my GPU doesn’t have enough memory to train my model or it works until the end. In my case, it ran fine for 46 epochs (batch size = 8, image size = (720, 1280)), then I got the error. I’m running the code from a Jupyter notebook.

Any help is appreciated, thank you.

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10

  • TensorFlow installed from (source or binary): intalled from pip inside Anaconda environment

  • TensorFlow version (use command below): 2.2.0-dev20200401

  • Python version: 3.7.6

  • CUDA/cuDNN version: nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019 Cuda compilation tools, release 10.1, V10.1.243

  • GPU model and memory: GeForce RTX 2080 Ti

Describe the current behavior Getting a ResourceExhaustedError in the middle of the training (46 epochs in).

Describe the expected behavior Expected to finish the training normally, or not work from the very first epoch.

Standalone code to reproduce the issue Since it’s not that small, I’m attaching a file. Feel free to update my report with the code inline if more appropriate. minimal_breaking_code.txt

Those are the entry variables: 2020-04-14 17_33_59-Window

Also, the images are 1080x1920.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.


ResourceExhaustedError Traceback (most recent call last) C:/Users/Gamer/Dropbox/Projetos/MLPython/Imports.py in <module> 31 ‘balanced_training_data’: False, 32 ‘data_augmentation’: True, —> 33 ‘training_data_samples’: None 34 } 35 }

~\Dropbox\Projetos\Ecotrace\NewImages\Model.py in run_experiment(data_path, images_data_path, training_data, balanced_training_data, validation_data, parameters) 338 keras.callbacks.ModelCheckpoint(filepath = last_model_filepath, save_best_only = True), 339 keras.callbacks.ModelCheckpoint(filepath = best_model_filepath, save_best_only = True), –> 340 keras.callbacks.CSVLogger(experiment_folder + ‘/history.csv’) 341 ] 342 )

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs) 69 def _method_wrapper(self, *args, **kwargs): 70 if not self._in_multi_worker_mode(): # pylint: disable=protected-access —> 71 return method(self, *args, **kwargs) 72 73 # Running inside run_distribute_coordinator already.

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs) 940 workers=workers, 941 use_multiprocessing=use_multiprocessing, –> 942 return_dict=True) 943 val_logs = {‘val_’ + name: val for name, val in val_logs.items()} 944 epoch_logs.update(val_logs)

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs) 69 def _method_wrapper(self, *args, **kwargs): 70 if not self._in_multi_worker_mode(): # pylint: disable=protected-access —> 71 return method(self, *args, **kwargs) 72 73 # Running inside run_distribute_coordinator already.

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\engine\training.py in evaluate(self, x, y, batch_size, verbose, sample_weight, steps, callbacks, max_queue_size, workers, use_multiprocessing, return_dict) 1171 with trace.Trace(‘TraceContext’, graph_type=‘test’, step_num=step): 1172 callbacks.on_test_batch_begin(step) -> 1173 tmp_logs = test_function(iterator) 1174 if data_handler.should_sync: 1175 context.async_wait()

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\eager\def_function.py in call(self, *args, **kwds) 606 xla_context.Exit() 607 else: –> 608 result = self._call(*args, **kwds) 609 610 if tracing_count == self._get_tracing_count():

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds) 644 # In this case we have not created variables on the first call. So we can 645 # run the first trace but we should fail if variables are created. –> 646 results = self._stateful_fn(*args, **kwds) 647 if self._created_variables: 648 raise ValueError(“Creating variables on a non-first call to a function”

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\eager\function.py in call(self, *args, **kwargs) 2418 with self._lock: 2419 graph_function, args, kwargs = self._maybe_define_function(args, kwargs) -> 2420 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access 2421 2422 @property

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\eager\function.py in _filtered_call(self, args, kwargs) 1663 if isinstance(t, (ops.Tensor, 1664 resource_variable_ops.BaseResourceVariable))), -> 1665 self.captured_inputs) 1666 1667 def _call_flat(self, args, captured_inputs, cancellation_manager=None):

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager) 1744 # No tape is watching; skip to running the function. 1745 return self._build_call_outputs(self._inference_function.call( -> 1746 ctx, args, cancellation_manager=cancellation_manager)) 1747 forward_backward = self._select_forward_and_backward_functions( 1748 args,

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager) 596 inputs=args, 597 attrs=attrs, –> 598 ctx=ctx) 599 else: 600 outputs = execute.execute_with_cancellation(

~\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 58 ctx.ensure_initialized() 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, —> 60 inputs, attrs, num_outputs) 61 except core._NotOkStatusException as e: 62 if name is not None:

ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: MemoryError: Unable to allocate 10.5 MiB for an array with shape (720, 1280, 3) and data type float32 Traceback (most recent call last):

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\ops\script_ops.py”, line 243, in call ret = func(*args)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\autograph\impl\api.py”, line 309, in wrapper return func(*args, **kwargs)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py”, line 784, in generator_py_func values = next(generator_state.get_iterator(iterator_id))

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py”, line 816, in wrapped_generator for data in generator_fn():

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\utils\data_utils.py”, line 1022, in get six.reraise(*sys.exc_info())

File “C:\Users\Gamer\AppData\Roaming\Python\Python37\site-packages\six.py”, line 693, in reraise raise value

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\utils\data_utils.py”, line 998, in get inputs = self.queue.get(block=True).get()

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\multiprocessing\pool.py”, line 657, in get raise self._value

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\multiprocessing\pool.py”, line 121, in worker result = (True, func(*args, **kwds))

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\utils\data_utils.py”, line 932, in next_sample return six.next(_SHARED_SEQUENCES[uid])

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\keras_preprocessing\image\iterator.py”, line 104, in next return self.next(*args, **kwargs)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\keras_preprocessing\image\iterator.py”, line 116, in next return self._get_batches_of_transformed_samples(index_array)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\keras_preprocessing\image\iterator.py”, line 231, in _get_batches_of_transformed_samples x = img_to_array(img, data_format=self.data_format)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\keras_preprocessing\image\utils.py”, line 299, in img_to_array x = np.asarray(img, dtype=dtype)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\numpy\core_asarray.py”, line 85, in asarray return array(a, dtype, copy=False, order=order)

MemoryError: Unable to allocate 10.5 MiB for an array with shape (720, 1280, 3) and data type float32

 [[{{node PyFunc}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[IteratorGetNext]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: MemoryError: Unable to allocate 10.5 MiB for an array with shape (720, 1280, 3) and data type float32 Traceback (most recent call last):

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\ops\script_ops.py”, line 243, in call ret = func(*args)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\autograph\impl\api.py”, line 309, in wrapper return func(*args, **kwargs)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py”, line 784, in generator_py_func values = next(generator_state.get_iterator(iterator_id))

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py”, line 816, in wrapped_generator for data in generator_fn():

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\utils\data_utils.py”, line 1022, in get six.reraise(*sys.exc_info())

File “C:\Users\Gamer\AppData\Roaming\Python\Python37\site-packages\six.py”, line 693, in reraise raise value

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\utils\data_utils.py”, line 998, in get inputs = self.queue.get(block=True).get()

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\multiprocessing\pool.py”, line 657, in get raise self._value

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\multiprocessing\pool.py”, line 121, in worker result = (True, func(*args, **kwds))

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\tensorflow\python\keras\utils\data_utils.py”, line 932, in next_sample return six.next(_SHARED_SEQUENCES[uid])

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\keras_preprocessing\image\iterator.py”, line 104, in next return self.next(*args, **kwargs)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\keras_preprocessing\image\iterator.py”, line 116, in next return self._get_batches_of_transformed_samples(index_array)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\keras_preprocessing\image\iterator.py”, line 231, in _get_batches_of_transformed_samples x = img_to_array(img, data_format=self.data_format)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\keras_preprocessing\image\utils.py”, line 299, in img_to_array x = np.asarray(img, dtype=dtype)

File “C:\Users\Gamer\Anaconda3\envs\TensorFlow-nightly\lib\site-packages\numpy\core_asarray.py”, line 85, in asarray return array(a, dtype, copy=False, order=order)

MemoryError: Unable to allocate 10.5 MiB for an array with shape (720, 1280, 3) and data type float32

 [[{{node PyFunc}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[IteratorGetNext]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[IteratorGetNext/_4]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored. [Op:__inference_test_function_16419]

Function call stack: test_function -> test_function

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 16 (5 by maintainers)

Most upvoted comments

I’m having the same problem with TF 2.4.0rc3 I can reproduce this problem on my PC for sure.

Anaconda python 3.7 Windows 10 GPU RTX 3090 24GB Total memory 96GB Tensorflow 2.4.0rc3 and CUDA11 and cuDNN 8.0.2

When the problem occurred, I had 70GB of RAM and 5GB of video memory left on my system. But “Unable to allocate 735. MiB”.

I exported a subset of the code and data used to reproduce the problem. For code and data, please see: https://github.com/liasece/tf-38414

To hide this problem, simply reduce frozen_batch_size from 64 to 32, i.e., reduce the batch size.

@ymodak @geetachavan1