tensorflow: OP_REQUIRES failed at constant_op.cc

There is a mistake, In Python, I use tf.while_loop to send LSTM the initial state. The following error occurred when freezing the model to call C language.


2020-10-29 17:19:01.449470: E tensorflow/core/framework/tensor.cc:555] Could not decode variant with type_name: "tensorflow::TensorList".  Perhaps you forgot to register a decoder via REGISTER_UNARY_VARIANT_DECODE_FUNCTION?
2020-10-29 17:19:01.451491: W tensorflow/core/framework/op_kernel.cc:1744] OP_REQUIRES failed at constant_op.cc:82 : Invalid argument: Cannot parse tensor from tensor_proto.
2020-10-29 17:19:02.134350: E tensorflow/core/framework/tensor.cc:555] Could not decode variant with type_name: "tensorflow::TensorList".  Perhaps you forgot to register a decoder via REGISTER_UNARY_VARIANT_DECODE_FUNCTION?
2020-10-29 17:19:02.194184: W tensorflow/core/framework/op_kernel.cc:1744] OP_REQUIRES failed at constant_op.cc:82 : Invalid argument: Cannot parse tensor from proto: dtype: DT_VARIANT

Is it necessary to compile source code to support it? Or is my usage incorrect?

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 28 (12 by maintainers)

Commits related to this issue

Most upvoted comments

Btw, assigning TF 2.3 tag shoud presumably imply that it has been fixed in 2.4 or it is not relevant anymore, but no, this bug is still present in 2.4.1.

In fact, there are 2 bugs. For some inference models (maybe when there are 2 or more saved models in one global inference context) some TF libraries got loaded twice and some global variables got overwritten.

The second bug was introduced in https://github.com/tensorflow/tensorflow/commit/de37b1eaca05431822223e5c996bc08245cf523b as found by Alexander Bayandin above - TF statically loads a bunch of TF_VARIANT decoders, but then it dynamically loads the same (or some) libraries and overwrites global lists, this bug actually existed forever, but no one uses TF_VARIANT to store bool or int32, so those got overwritten (actually, lost), but no one cares. But variant decoder for TensorList got lost and reverting de37b1eaca05431822223e5c996bc08245cf523b should have fixed that. I have fixed this bug by explicitly calling single static function defined not in a header, but in C code.

But there is the first bug introduced in 15275d3a14c77e2244ae1155f93243256f08e3ed - again, because of the “second” library loading RTTI changes for every class for every new dynamic library loading, thus TensorList got different RTTI ID and decoder in variant->get<VariantTensorDataProto>() refuses to decode protobuf. You have fixed this for MacOS by reverting to the old behaviour, now my patch forces this for everyone else. This first bug ends up with the debug message printed above.

Or these two bugs can be different issue with TF loading some of its modules/libraries/anything multiple time for the inference environment with multiple saved_model in them.

This patch fixes problem for me with different models on CPU and GPU on linux. I will wait for some time for others to test and/or confirm whether it works or not and then will make a proper pull request.

This patch fixes problem for me with different models on CPU and GPU on linux. I will wait for some time for others to test and/or confirm whether it works or not and then will make a proper pull request.

The patch helped for my case on macOS CPU Thanks!

Please check if this patch against 2.4.1 helps, there are 2 bugs with static/dynamic loading of TF libraries and this works it around

diff --git a/tensorflow/core/framework/type_index.h b/tensorflow/core/framework/type_index.h
index 7986904dd7a..5519df474d0 100644
--- a/tensorflow/core/framework/type_index.h
+++ b/tensorflow/core/framework/type_index.h
@@ -24,9 +24,10 @@ limitations under the License.
 
 #include "tensorflow/core/platform/types.h"
 
-#if defined(MACOS) || defined(TARGET_OS_MAC)
+#define USE_NAME_BASED_RTTI_ONLY
+#if defined(MACOS) || defined(TARGET_OS_MAC) || defined(USE_NAME_BASED_RTTI_ONLY)
 #include "tensorflow/core/platform/hash.h"
-#endif  // defined(MACOS) || defined(TARGET_OS_MAC)
+#endif  // defined(MACOS) || defined(TARGET_OS_MAC) || defined(USE_NAME_BASED_RTTI_ONLY)
 
 namespace tensorflow {
 
@@ -62,7 +63,7 @@ class TypeIndex {
 
 #if defined(__GXX_RTTI) || defined(_CPPRTTI)
 
-#if defined(MACOS) || defined(TARGET_OS_MAC)
+#if defined(MACOS) || defined(TARGET_OS_MAC) || defined(USE_NAME_BASED_RTTI_ONLY)
     // Use a hash based on the type name to avoid issues due to RTLD_LOCAL on
     // MacOS (b/156979412).
     return TypeIndex(Hash64(typeid(T).name()), typeid(T).name());
diff --git a/tensorflow/core/framework/variant_op_registry.cc b/tensorflow/core/framework/variant_op_registry.cc
index aa3bdeab5e2..069542d784a 100644
--- a/tensorflow/core/framework/variant_op_registry.cc
+++ b/tensorflow/core/framework/variant_op_registry.cc
@@ -26,6 +26,16 @@ limitations under the License.
 
 namespace tensorflow {
 
+// Get a pointer to a global UnaryVariantOpRegistry object
+UnaryVariantOpRegistry* UnaryVariantOpRegistryGlobal() {
+  static UnaryVariantOpRegistry* global_unary_variant_op_registry = NULL;
+
+  if (global_unary_variant_op_registry == NULL) {
+    global_unary_variant_op_registry = new UnaryVariantOpRegistry;
+  }
+  return global_unary_variant_op_registry;
+}
+
 std::unordered_set<string>* UnaryVariantOpRegistry::PersistentStringStorage() {
   static std::unordered_set<string>* string_storage =
       new std::unordered_set<string>();
@@ -55,6 +65,13 @@ bool DecodeUnaryVariant(Variant* variant) {
   if (variant->TypeName().empty()) {
     VariantTensorDataProto* t = variant->get<VariantTensorDataProto>();
     if (t == nullptr || !t->metadata().empty() || !t->tensors().empty()) {
+      LOG(ERROR) << __PRETTY_FUNCTION__ << ": empty typename, malformed variant: t == nullptr: " << (t == nullptr) << ": " << variant->DebugString() << std::endl;
+      if (t != nullptr) {
+         LOG(ERROR) << __PRETTY_FUNCTION__ << ": empty typename, malformed variant: t != nullptr: !t->metadata().empty(): " << !t->metadata().empty()
+		 << ", !t->tensors().empty(): " << !t->tensors().empty()
+		 << ": " << variant->DebugString()
+		 << std::endl;
+      }
       // Malformed variant.
       return false;
     } else {
@@ -66,11 +83,15 @@ bool DecodeUnaryVariant(Variant* variant) {
   UnaryVariantOpRegistry::VariantDecodeFn* decode_fn =
       UnaryVariantOpRegistry::Global()->GetDecodeFn(variant->TypeName());
   if (decode_fn == nullptr) {
+    LOG(ERROR) << __PRETTY_FUNCTION__ << variant->TypeName() << ": decode_fn == nullptr: " << variant->DebugString() << std::endl; 
     return false;
   }
   const string type_name = variant->TypeName();
   bool decoded = (*decode_fn)(variant);
-  if (!decoded) return false;
+  if (!decoded) {
+    LOG(ERROR) << type_name << " ->" << variant->TypeName() << ": not decoded:" << variant->DebugString() << std::endl; 
+    return false;
+  }
   if (variant->TypeName() != type_name) {
     LOG(ERROR) << "DecodeUnaryVariant: Variant type_name before decoding was: "
                << type_name
diff --git a/tensorflow/core/framework/variant_op_registry.h b/tensorflow/core/framework/variant_op_registry.h
index edfb9c544c0..ed8bec6cc6e 100644
--- a/tensorflow/core/framework/variant_op_registry.h
+++ b/tensorflow/core/framework/variant_op_registry.h
@@ -56,6 +56,9 @@ enum VariantDeviceCopyDirection {
   DEVICE_TO_DEVICE = 3,
 };
 
+class UnaryVariantOpRegistry;
+extern UnaryVariantOpRegistry* UnaryVariantOpRegistryGlobal();
+
 class UnaryVariantOpRegistry {
  public:
   typedef std::function<bool(Variant*)> VariantDecodeFn;
@@ -170,9 +173,7 @@ class UnaryVariantOpRegistry {
 
   // Get a pointer to a global UnaryVariantOpRegistry object
   static UnaryVariantOpRegistry* Global() {
-    static UnaryVariantOpRegistry* global_unary_variant_op_registry =
-        new UnaryVariantOpRegistry;
-    return global_unary_variant_op_registry;
+    return UnaryVariantOpRegistryGlobal();
   }
 
   // Get a pointer to a global persistent string storage object.

This patch also fixed the same error for me. Linux assorted (ancient) cpus, V100 + P6000 gpus.