keras: multi_gpu_model not working w/ TensorFlow 1.14
System information
- Have I written custom code (as opposed to using example directory): No/Yes (very slight change to an example)
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- TensorFlow backend (yes / no): yes
- TensorFlow version: 1.14
- Keras version: Latest master from github
- Python version: 3.7 (through Anaconda)
- CUDA/cuDNN version: 10.0/7.4.2
- GPU model and memory: 2x Tesla K80 (11GB each)
Describe the current behavior
I am using the cifar-10 ResNet example from the Keras examples directory, with the addition of the following line at Line number 360 (just before compilation) in order to use multiple GPUs while training. However this doesn’t work.
Line Added:
model = keras.utils.multi_gpu_model(model, gpus=2)
Traceback Error log:
Traceback (most recent call last):
File "cifar10_resnet_multigpu.py", line 360, in <module>
model = keras.utils.multi_gpu_model(model, gpus=2)
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/utils/multi_gpu_utils.py", line 230, in multi_gpu_model
outputs = model(inputs)
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/engine/base_layer.py", line 451, in __call__
output = self.call(inputs, **kwargs)
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/engine/network.py", line 570, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/engine/network.py", line 727, in run_internal_graph
layer.call(computed_tensor, **kwargs))
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/layers/normalization.py", line 185, in call
epsilon=self.epsilon)
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2053, in normalize_batch_in_training
if not _has_nchw_support() and list(reduction_axes) == [0, 2, 3]:
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 299, in _has_nchw_support
explicitly_on_cpu = _is_current_explicit_device('CPU')
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 272, in _is_current_explicit_device
device = _get_current_tf_device()
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 252, in _get_current_tf_device
g._apply_device_functions(op)
File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 4581, in _apply_device_functions
op._set_device_from_string(device_string)
AttributeError: '_TfDeviceCaptureOp' object has no attribute '_set_device_from_string'
Describe the expected behavior
Previously, this typically worked fine and results in faster training due to parallelization across GPUs.
Note: This works fine if the backend is Tensorflow 1.13, so this is a regression.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 26
@QtacierP It works with TF 1.13 and CUDA 10.0 for me, its just TF 1.14 that’s a problem
I have the same problem when calling:
with device('/gpu:0' if use_GPU else '/cpu:0'): portion of code
Tensorflow-gpu 1.14 has disappointed me as well. I consider 1.13.2 a last reliable version.
I believe 1.14 is currently more similar to TensorFlow 2 rather than to TensorFlow 1. Why would there be explicit necessity to change package path to “v1” otherwise.
Please consider backwards compatibility for 1.x.x versions if the version still starts with 1.
yes, TF 1.14 issue, see https://github.com/tensorflow/tensorflow/issues/30728