keras: multi_gpu_model not working w/ TensorFlow 1.14

System information

  • Have I written custom code (as opposed to using example directory): No/Yes (very slight change to an example)
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • TensorFlow backend (yes / no): yes
  • TensorFlow version: 1.14
  • Keras version: Latest master from github
  • Python version: 3.7 (through Anaconda)
  • CUDA/cuDNN version: 10.0/7.4.2
  • GPU model and memory: 2x Tesla K80 (11GB each)

Describe the current behavior

I am using the cifar-10 ResNet example from the Keras examples directory, with the addition of the following line at Line number 360 (just before compilation) in order to use multiple GPUs while training. However this doesn’t work.

Line Added: model = keras.utils.multi_gpu_model(model, gpus=2)

Traceback Error log:

Traceback (most recent call last):
  File "cifar10_resnet_multigpu.py", line 360, in <module>
    model = keras.utils.multi_gpu_model(model, gpus=2)
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/utils/multi_gpu_utils.py", line 230, in multi_gpu_model
    outputs = model(inputs)
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/engine/base_layer.py", line 451, in __call__
    output = self.call(inputs, **kwargs)
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/engine/network.py", line 570, in call
    output_tensors, _, _ = self.run_internal_graph(inputs, masks)
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/engine/network.py", line 727, in run_internal_graph
    layer.call(computed_tensor, **kwargs))
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/layers/normalization.py", line 185, in call
    epsilon=self.epsilon)
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2053, in normalize_batch_in_training
    if not _has_nchw_support() and list(reduction_axes) == [0, 2, 3]:
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 299, in _has_nchw_support
    explicitly_on_cpu = _is_current_explicit_device('CPU')
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 272, in _is_current_explicit_device
    device = _get_current_tf_device()
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 252, in _get_current_tf_device
    g._apply_device_functions(op)
  File "/local/home/manasa/vpds2/conda/anaconda3/envs/tensorflow114/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 4581, in _apply_device_functions
    op._set_device_from_string(device_string)
AttributeError: '_TfDeviceCaptureOp' object has no attribute '_set_device_from_string'

Describe the expected behavior

Previously, this typically worked fine and results in faster training due to parallelization across GPUs.

Note: This works fine if the backend is Tensorflow 1.13, so this is a regression.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 26

Most upvoted comments

@QtacierP It works with TF 1.13 and CUDA 10.0 for me, its just TF 1.14 that’s a problem

I have the same problem when calling: with device('/gpu:0' if use_GPU else '/cpu:0'): portion of code

Tensorflow-gpu 1.14 has disappointed me as well. I consider 1.13.2 a last reliable version.

Just importing it causes incompatibilities:

  • for example with numpy,
  • with management of GPUs / CPUs.

Many things have changed package path and there is no backwards compatibility, for example:

  • package path to TocoConverter/TFLiteConverter
  • package path to set_image_dim_ordering
  • many other places get warning like for example “tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead”

I believe 1.14 is currently more similar to TensorFlow 2 rather than to TensorFlow 1. Why would there be explicit necessity to change package path to “v1” otherwise.

Please consider backwards compatibility for 1.x.x versions if the version still starts with 1.