tensorflow: OP_REQUIRES failed at constant_op.cc
There is a mistake, In Python, I use tf.while_loop to send LSTM the initial state. The following error occurred when freezing the model to call C language.
2020-10-29 17:19:01.449470: E tensorflow/core/framework/tensor.cc:555] Could not decode variant with type_name: "tensorflow::TensorList". Perhaps you forgot to register a decoder via REGISTER_UNARY_VARIANT_DECODE_FUNCTION?
2020-10-29 17:19:01.451491: W tensorflow/core/framework/op_kernel.cc:1744] OP_REQUIRES failed at constant_op.cc:82 : Invalid argument: Cannot parse tensor from tensor_proto.
2020-10-29 17:19:02.134350: E tensorflow/core/framework/tensor.cc:555] Could not decode variant with type_name: "tensorflow::TensorList". Perhaps you forgot to register a decoder via REGISTER_UNARY_VARIANT_DECODE_FUNCTION?
2020-10-29 17:19:02.194184: W tensorflow/core/framework/op_kernel.cc:1744] OP_REQUIRES failed at constant_op.cc:82 : Invalid argument: Cannot parse tensor from proto: dtype: DT_VARIANT
Is it necessary to compile source code to support it? Or is my usage incorrect?
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 28 (12 by maintainers)
Commits related to this issue
- For some inference models (maybe when there are 2 or more saved models in one global inference context) some TF libraries got loaded twice and some global variables got overwritten. This bug was intr... — committed to bioothod/tensorflow by bioothod 3 years ago
- This bug was introduced in 15275d3 - because TF libs got loaded the second time somewhere in its life cycle (probably dynamically when there are multiple saved_model models in the same inference envir... — committed to bioothod/tensorflow by bioothod 3 years ago
Btw, assigning
TF 2.3tag shoud presumably imply that it has been fixed in 2.4 or it is not relevant anymore, but no, this bug is still present in 2.4.1.In fact, there are 2 bugs. For some inference models (maybe when there are 2 or more saved models in one global inference context) some TF libraries got loaded twice and some global variables got overwritten.
The second bug was introduced in https://github.com/tensorflow/tensorflow/commit/de37b1eaca05431822223e5c996bc08245cf523b as found by Alexander Bayandin above - TF statically loads a bunch of
TF_VARIANTdecoders, but then it dynamically loads the same (or some) libraries and overwrites global lists, this bug actually existed forever, but no one uses TF_VARIANT to store bool or int32, so those got overwritten (actually, lost), but no one cares. But variant decoder for TensorList got lost and reverting de37b1eaca05431822223e5c996bc08245cf523b should have fixed that. I have fixed this bug by explicitly calling single static function defined not in a header, but in C code.But there is the first bug introduced in 15275d3a14c77e2244ae1155f93243256f08e3ed - again, because of the “second” library loading RTTI changes for every class for every new dynamic library loading, thus TensorList got different RTTI ID and decoder in
variant->get<VariantTensorDataProto>()refuses to decode protobuf. You have fixed this for MacOS by reverting to the old behaviour, now my patch forces this for everyone else. This first bug ends up with the debug message printed above.Or these two bugs can be different issue with TF loading some of its modules/libraries/anything multiple time for the inference environment with multiple saved_model in them.
This patch fixes problem for me with different models on CPU and GPU on linux. I will wait for some time for others to test and/or confirm whether it works or not and then will make a proper pull request.
The patch helped for my case on macOS CPU Thanks!
Please check if this patch against 2.4.1 helps, there are 2 bugs with static/dynamic loading of TF libraries and this works it around
This patch also fixed the same error for me. Linux assorted (ancient) cpus, V100 + P6000 gpus.