tensorflow: tf.device scope not working correctly

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Y es
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ppc64le-linux
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary)
TensorFlow version (use command below):
Python version: python3.7
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:10.1.243
GPU model and memory: V100 16GB

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:

TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior

tf.device command does not correctly assign a GPU device to tf.keras layers on node with 4 GPUs so cannot implement model parallelism. All layers appear on device GPU:0 with the exception of some IO based on output of tf.debugging.set_log_device_placement(True)

Describe the expected behavior tf.keras layers are correctly assigned to a device.

Standalone code to reproduce the issue

import tensorflow as tf from tensorflow import keras

tf.debugging.set_log_device_placement(True)

print(“On GPU:1”) inputs = keras.Input(shape=(784,)) with tf.device(“/device:GPU:1”): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc. x = keras.layers.Dense(256, activation=“relu”)(inputs) print(x) assert x.device.endswith(“/GPU:1”)

Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

I have a larger test problem that will run on a 4 GPU node. If you turn off the assert statement, then using nvidia-smi you can see that all memory and computational work is happening on GPU:0 and almost none is assigned to other GPUs. Happy to supply this code if needed.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Tensor(“dense/Identity:0”, shape=(None, 256), dtype=float32) Traceback (most recent call last): File “py_test.py”, line 11, in <module> assert x.device.endswith(“/GPU:1”) AssertionError

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 19 (5 by maintainers)

Most upvoted comments

Thank you for the interesting solution to this problem. I am investigating whether this will work on the model that I have developed.

I noticed that the latest versions of tf.keras.layers no longer has the attribute ‘device’ which means, as you say, that Keras no longer supports model parallelism where we assign layers to a device.

Interestingly, the development of GPUs with much larger memory eg 80GB on the A100 and the proposed new Grace architecture which allow high bandwidth access to CPU memory, will reduce the need for model parallelism. However, this will likely be offset by the desire to build bigger more complex model systems using the Keras functional API.

JohnTaylor2000 on Jun 15, 2021

I think TF currently does not support model parallelism with keras model like what you have written. However, I believe they do support model parallelism with primitive operations. Also, it is almost like they do not have any docs about model parallelism which is an issue. Maybe you can try something like wrapping all init, build and call methods inside with tf.device(the device you want). Example code:

import tensorflow as tf
import numpy as np


tf.debugging.set_log_device_placement(True)

class Dense_in_gpu(tf.keras.layers.Layer):
    def __init__(self, gpu_to_use, units,activation=None,**kwargs):
        self.gpu_to_use=gpu_to_use
        with tf.device(f"/GPU:{self.gpu_to_use}"):
            self.dense=tf.keras.layers.Dense(units,activation=activation)
        super().__init__(**kwargs)

    def call(self, inputs):
        with tf.device(f"/GPU:{self.gpu_to_use}"):
            return self.dense(inputs)


inputs = tf.keras.Input(shape=(784,))
#run in the first gpu
x = Dense_in_gpu(0,64, activation="relu")(inputs)
#run in the second gpu
x = Dense_in_gpu(1,64, activation="relu")(x)
#run in the third gpu
outputs = Dense_in_gpu(2,10)(x)

model = tf.keras.Model(inputs=inputs, outputs=outputs)

model.predict_on_batch(np.random.rand(64,784))

tf_dataset=tf.data.Dataset.from_tensor_slices((np.random.rand(64,784),np.random.rand(64,10)))
tf_dataset=tf_dataset.batch(64)
model.compile(optimizer=tf.keras.optimizers.RMSprop(), loss=tf.keras.losses.MeanSquaredError())
model.fit(tf_dataset)

laplacericky on May 29, 2021

The actual model that I am working on is too large to fit into GPU memory. I have a data parallel code using horovod that runs on hundreds of GPUs but now need to use a larger model. To do this I need to spread the layers across multiple GPUs.

JohnTaylor2000 on Nov 5, 2020