tensorflow: PiecewiseConstantDecay doesn't work with Wrapping Optimizer on GPUs

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.5/2.6
  • Python version: 3.8
  • Bazel version (if compiling from source): NA
  • GCC/Compiler version (if compiling from source): NA
  • CUDA/cuDNN version: 11.2
  • GPU model and memory: 8.2

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior

When using the PiecewiseConstantDecay for an optimizer and wrapping optimizer, like:

lr_fn = PiecewiseConstantDecay()
opt = SGD(lr_fn)
opt = WrapOpt(opt)

We will hit the error of

InvalidArgumentError: Cannot assign a device for operation sequential_1/dense_1/Tensordot/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential_1/dense_1/Tensordot/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices 

Note, this error seems to be hit only when PiecewiseConstantDecay schedule is used on GPUs.

Describe the expected behavior

We shouldn’t see such error when using PiecewiseConstantDecay on GPUs.

Contributing - Do you want to contribute a PR? (yes/no): - Briefly describe your candidate solution (if contributing):

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

Below is the colab link and please repro it with Runtime=GPU. https://colab.research.google.com/drive/1QPx4IqQNVpSR-ALfPYbjJjRUffyHo06G?usp=sharing

import tensorflow as tf
from tensorflow.keras import layers, optimizers, models
print(tf.__version__)
class OptimizerWrapper(optimizers.Optimizer):
  def __init__(self, optimizer, name=None, **kwargs):
    super(OptimizerWrapper, self).__init__(name, **kwargs)
    self._optimizer = optimizer

  def _create_slots(self, var_list):
    self._optimizer._create_slots(var_list)

  def _resource_apply_dense(self, grad, var):
    return self._optimizer._resource_apply_dense(grad, var)

  def _resource_apply_sparse(self, grad, var):
    return self._optimizer._resource_apply_sparse(grad, var)

  def get_config(self):
    return self._optimizer.get_config()


model = tf.keras.Sequential()
model.add(layers.Dense(8))
x = tf.constant(12., shape=(5, 1, 2, 4))
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
learning_rate_fn = optimizers.schedules.PiecewiseConstantDecay(
    boundaries, values)
#learning_rate_fn = optimizers.schedules.ExponentialDecay(
#    0.1, decay_steps=100000, decay_rate=0.96, staircase=True)
#learning_rate_fn = optimizers.schedules.PolynomialDecay(
#    0.1, 10000, 0.01, power=0.5)
opt = optimizers.SGD(learning_rate=learning_rate_fn, momentum=1.0)
opt = OptimizerWrapper(opt)

@tf.function
def train_step(x):
  with tf.GradientTape(persistent=True) as tape:
    y = model(x)
    loss = tf.reduce_mean(y)

  grads = tape.gradient(loss, model.variables)
  opt.apply_gradients(zip(grads, model.variables))
  return loss

for i in range(3):
  loss = train_step(x)
  print("Loss:", loss)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-2-354cdd24a945> in <module>()
     45 
     46 for i in range(3):
---> 47   loss = train_step(x)
     48   print("Loss:", loss)
     49 

5 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError: Cannot assign a device for operation sequential_1/dense_1/Tensordot/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential_1/dense_1/Tensordot/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]

cc. @nluehr

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 16 (15 by maintainers)

Most upvoted comments

I’m not sure what’s causing the issue, but delegating apply_gradients in OptimizerWrapper solves the issue:

class OptimizerWrapper(optimizers.Optimizer):

  ...

  def apply_gradients(self,
                      grads_and_vars,
                      name=None,
                      experimental_aggregate_gradients=True):
    self._optimizer.apply_gradients(grads_and_vars, name,
                                    experimental_aggregate_gradients)

This is what tf.keras.mixed_precision.LossScaleOptimizer does, which is the only optimizer wrapper within Keras.

Still, this is a bad error message and this issue should be fixed. /CC @fchollet @tomerk, can either of you take a look or triage?