tensorflow: PiecewiseConstantDecay doesn't work with Wrapping Optimizer on GPUs
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 2.5/2.6
- Python version: 3.8
- Bazel version (if compiling from source): NA
- GCC/Compiler version (if compiling from source): NA
- CUDA/cuDNN version: 11.2
- GPU model and memory: 8.2
You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:
- TF 1.0:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" - TF 2.0:
python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
When using the PiecewiseConstantDecay for an optimizer and wrapping optimizer, like:
lr_fn = PiecewiseConstantDecay()
opt = SGD(lr_fn)
opt = WrapOpt(opt)
We will hit the error of
InvalidArgumentError: Cannot assign a device for operation sequential_1/dense_1/Tensordot/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential_1/dense_1/Tensordot/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices
Note, this error seems to be hit only when PiecewiseConstantDecay schedule is used on GPUs.
Describe the expected behavior
We shouldn’t see such error when using PiecewiseConstantDecay on GPUs.
Contributing - Do you want to contribute a PR? (yes/no): - Briefly describe your candidate solution (if contributing):
Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.
Below is the colab link and please repro it with Runtime=GPU. https://colab.research.google.com/drive/1QPx4IqQNVpSR-ALfPYbjJjRUffyHo06G?usp=sharing
import tensorflow as tf
from tensorflow.keras import layers, optimizers, models
print(tf.__version__)
class OptimizerWrapper(optimizers.Optimizer):
def __init__(self, optimizer, name=None, **kwargs):
super(OptimizerWrapper, self).__init__(name, **kwargs)
self._optimizer = optimizer
def _create_slots(self, var_list):
self._optimizer._create_slots(var_list)
def _resource_apply_dense(self, grad, var):
return self._optimizer._resource_apply_dense(grad, var)
def _resource_apply_sparse(self, grad, var):
return self._optimizer._resource_apply_sparse(grad, var)
def get_config(self):
return self._optimizer.get_config()
model = tf.keras.Sequential()
model.add(layers.Dense(8))
x = tf.constant(12., shape=(5, 1, 2, 4))
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
learning_rate_fn = optimizers.schedules.PiecewiseConstantDecay(
boundaries, values)
#learning_rate_fn = optimizers.schedules.ExponentialDecay(
# 0.1, decay_steps=100000, decay_rate=0.96, staircase=True)
#learning_rate_fn = optimizers.schedules.PolynomialDecay(
# 0.1, 10000, 0.01, power=0.5)
opt = optimizers.SGD(learning_rate=learning_rate_fn, momentum=1.0)
opt = OptimizerWrapper(opt)
@tf.function
def train_step(x):
with tf.GradientTape(persistent=True) as tape:
y = model(x)
loss = tf.reduce_mean(y)
grads = tape.gradient(loss, model.variables)
opt.apply_gradients(zip(grads, model.variables))
return loss
for i in range(3):
loss = train_step(x)
print("Loss:", loss)
Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
InvalidArgumentError Traceback (most recent call last)
<ipython-input-2-354cdd24a945> in <module>()
45
46 for i in range(3):
---> 47 loss = train_step(x)
48 print("Loss:", loss)
49
5 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
InvalidArgumentError: Cannot assign a device for operation sequential_1/dense_1/Tensordot/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential_1/dense_1/Tensordot/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
cc. @nluehr
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 16 (15 by maintainers)
I’m not sure what’s causing the issue, but delegating
apply_gradientsinOptimizerWrappersolves the issue:This is what
tf.keras.mixed_precision.LossScaleOptimizerdoes, which is the only optimizer wrapper within Keras.Still, this is a bad error message and this issue should be fixed. /CC @fchollet @tomerk, can either of you take a look or triage?