tensorflow: Masking LSTM: OP_REQUIRES failed at cudnn_rnn_ops.cc:1498 : Unknown: CUDNN_STATUS_BAD_PARAM
System information
- Have I written custom code: Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.2 LTS
- TensorFlow installed from (source or binary): Binary, pip
- TensorFlow version (use command below): v2.0.0-rc2-26-g64c3d38 2.0.0
- Python version: Python 3.7.3
- CUDA/cuDNN version: CUDA=10.0, CUDNN=7.6.2.24-1
- GPU model and memory: Quadro RTX 6000 major: 7 minor: 5 memoryClockRate(GHz): 1.77
Describe the problem
It seems there is an issue with the CuDNN LSTM implementation when using a tf.keras.layers.Masking layer.
batch_size = 256
num_tsteps = 144
num_features = 130
num_units = 88
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(num_tsteps, num_features), batch_size=batch_size),
tf.keras.layers.Masking(mask_value=0.0, input_shape=(num_tsteps, num_features)),
tf.keras.layers.LSTM(num_units, batch_input_shape=(batch_size, num_tsteps, num_features), return_sequences=True, stateful=False),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1)),
tf.keras.layers.Activation('sigmoid'),
])
Similar to #33069 I receive this error during training and I have strictly right-padded data (I am doing trimming and right-padding manually). However, in contrast to this issue, I confirmed that I do not have any inputs containing only zeroes via the following snippet:
for i, e in enumerate(ds_train):
res = []
f, l = [x.numpy() for x in e]
for j in range(f.shape[0]):
if not (f[j] == 0.0).all():
res.append(1)
else:
res.append(0)
fin = [res[0]]
for e in res[1:]:
if e != fin[-1]:
fin.append(e)
print("i {}: {}".format(i, fin))
# Result:
i 0: [1, 0]
i 1: [1, 0]
i 2: [1, 0]
i 3: [1, 0]
i 4: [1]
i 5: [1, 0]
...
If I remove the Masking-layer, the error does not occur. I confirmed this by running a complete epoch (2324 batches), however, the training is probably pretty pointless when including the padded data.
Is there any other pitfall that I am missing that could cause this issue?
Source code / logs
Python output:
Epoch 1/1000
WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are:
CancelledErrorTraceback (most recent call last)
<ipython-input-7-1c503c2dd55c> in <module>
----> 1 m.fit(train=True)
/ws/tf/vol_local/_model_lstm.py in fit(self, train, verbose)
315 ]
316 self.model.fit(ds_train, epochs=num_epochs, verbose=verbose, shuffle=False,
--> 317 validation_data=ds_val, validation_steps=None, callbacks=cbs)
318 #self.model.save(sess_hdf5_path)
319 self.model.save_weights(self.sess_h5_path.as_posix())
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
726 max_queue_size=max_queue_size,
727 workers=workers,
--> 728 use_multiprocessing=use_multiprocessing)
729
730 def evaluate(self,
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, **kwargs)
322 mode=ModeKeys.TRAIN,
323 training_context=training_context,
--> 324 total_epochs=epochs)
325 cbks.make_logs(model, epoch_logs, training_result, ModeKeys.TRAIN)
326
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py in run_one_epoch(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs)
121 step=step, mode=mode, size=current_batch_size) as batch_logs:
122 try:
--> 123 batch_outs = execution_function(iterator)
124 except (StopIteration, errors.OutOfRangeError):
125 # TODO(kaftan): File bug about tf function and errors.OutOfRangeError?
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py in execution_function(input_fn)
84 # `numpy` translates Tensors to values in Eager mode.
85 return nest.map_structure(_non_none_constant_value,
---> 86 distributed_function(input_fn))
87
88 return execution_function
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py in __call__(self, *args, **kwds)
455
456 tracing_count = self._get_tracing_count()
--> 457 result = self._call(*args, **kwds)
458 if tracing_count == self._get_tracing_count():
459 self._call_counter.called_without_tracing()
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py in _call(self, *args, **kwds)
518 # Lifting succeeded, so variables are initialized and we can run the
519 # stateless function.
--> 520 return self._stateless_fn(*args, **kwds)
521 else:
522 canon_args, canon_kwds = \
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py in __call__(self, *args, **kwargs)
1821 """Calls a graph function specialized to the inputs."""
1822 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 1823 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
1824
1825 @property
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py in _filtered_call(self, args, kwargs)
1139 if isinstance(t, (ops.Tensor,
1140 resource_variable_ops.BaseResourceVariable))),
-> 1141 self.captured_inputs)
1142
1143 def _call_flat(self, args, captured_inputs, cancellation_manager=None):
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1222 if executing_eagerly:
1223 flat_outputs = forward_function.call(
-> 1224 ctx, args, cancellation_manager=cancellation_manager)
1225 else:
1226 gradient_name = self._delayed_rewrite_functions.register()
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py in call(self, ctx, args, cancellation_manager)
509 inputs=args,
510 attrs=("executor_type", executor_type, "config_proto", config),
--> 511 ctx=ctx)
512 else:
513 outputs = execute.execute_with_cancellation(
/ws/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
65 else:
66 message = e.message
---> 67 six.raise_from(core._status_to_exception(e.code, message), None)
68 except TypeError as e:
69 keras_symbolic_tensors = [
/ws/miniconda3/lib/python3.7/site-packages/six.py in raise_from(value, from_value)
CancelledError: [_Derived_]RecvAsync is cancelled.
[[{{node metrics/accuracy/broadcast_weights/assert_broadcastable/AssertGuard/else/_36/Assert/data_2/_62}}]]
[[loss/activation_loss/weighted_loss/broadcast_weights/assert_broadcastable/is_valid_shape/else/_1/has_valid_nonscalar_shape/then/_106/has_invalid_dims/concat/_28]] [Op:__inference_distributed_function_172102]
Function call stack:
distributed_function
Command line log:
2019-10-08 14:38:27.367875: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference___backward_cudnn_lstm_with_fallback_169668_171093' and '__inference___backward_cudnn_lstm_with_fallback_169668_171093_specialized_for_StatefulPartitionedCall_at___inference_distributed_function_172102' both implement 'lstm_dce676f4-acdd-4bb5-88d9-e8dd57573aba' but their signatures do not match.
2019-10-08 14:38:27.536666: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-10-08 14:38:39.982582: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-10-08 14:38:41.215567: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cudnn_rnn_ops.cc:1498 : Unknown: CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1424): 'cudnnSetRNNDataDescriptor( data_desc.get(), data_type, layout, max_seq_length, batch_size, data_size, seq_lengths_array, (void*)&padding_fill)'
2019-10-08 14:38:41.215616: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1424): 'cudnnSetRNNDataDescriptor( data_desc.get(), data_type, layout, max_seq_length, batch_size, data_size, seq_lengths_array, (void*)&padding_fill)'
[[{{node cond_64/then/_0/CudnnRNNV3}}]]
2019-10-08 14:38:41.215638: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Cancelled: [_Derived_]RecvAsync is cancelled.
[[{{node metrics/accuracy/broadcast_weights/assert_broadcastable/AssertGuard/else/_36/Assert/data_2/_62}}]]
[[loss/activation_loss/weighted_loss/broadcast_weights/assert_broadcastable/is_valid_shape/else/_1/has_valid_nonscalar_shape/then/_106/has_invalid_dims/concat/_28]]
2019-10-08 14:38:41.215693: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Cancelled: [_Derived_]RecvAsync is cancelled.
[[{{node metrics/accuracy/broadcast_weights/assert_broadcastable/AssertGuard/else/_36/Assert/data_2/_62}}]]
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 71 (24 by maintainers)
import tensorflow as tf tf.compat.v1.disable_eager_execution()
Dissable eager execution and everything is running fine without the fused rnn kernel. Thx for the help guys 😃
I changed my model from:
to:
And solved my isssue
I know @mimxrt’s code has the same model and I dont know why it works for me, but im adding this for anyone else comes here with the issue and maybe it can help with debugging
I think embedding(mask_zero=true) create this problem,there are two way I find to slove it 1"mask_zero=False",but it changes code process 2 your way
thanks a lot.
I have a work around that seems to work: force TF to use the non CuDNN implementation by selecting a sigmoid activation instead of TANH
layers.LSTM(...,activation='sigmoid')Outputs
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn’t meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
This forces TF to use a generic GPU kernel in place of CuDNN. It’s slower but a slower implementation is a lot faster than not working at all ;p
@houtoms, Thanks for providing the context.
From high level API’s perspective, I would expect the kernel to just return zeros for any of the sequence that is fully masked, rather than asking user to remove those values from the batch. It will be quite complicate to ask user to handle this on the python side.
@houtoms, will it be complicated to add this support (fully masked sequence) in the cudnn kernel?
I tried with
cudnn=8.0.4cudatoolkit=11.0.221tensorflow-gpu=2.4.0,and it fixed my problem. cudnn can be installed by:conda install -c nvidia cudnnI am still facing this issue using TF 2.2.0. I also found the same workaround of forcing the LSTM to not use the cuDNN implementation to work, however it is nearly prohibitively slow. I found the generic GPU implementation took ~30 times longer to train per epoch than the cuDNN version. I hope this can be fixed soon.
@houtoms As it seems the implementation takes more time than expected I wanted to continue and try your suggestion of using a dynamic number of timesteps (instead of the fixed 144). Unfortunately I get an error when doing this in TF 2.1.0 (calling
model.fit()with an input of<PrefetchDataset shapes: ((128, None, 128), (128, None, 1)), types: (tf.float32, tf.float32)>:InvalidArgumentError: ValueError: Attempt to convert a value (<BatchDataset shapes: ((128, None, 128), (128, None, 1)), types: (tf.float32, tf.float32)>) with an unsupported type (<class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'>) to a Tensor.Could it be that I misunderstood your suggestion (note the
Nonein the input dimensions)? All ideas are much appreciated.I can reproduce that
mask_zeros=Trueis causing the crash. Doesn’t matter if eager is on or off, with or without callbacks.LSTM on a masked sequence is extremely common in NLP models, so this is a major bug in terms of impact.