tensorflow: Wrong order of dependencies after running freeze_graph and/or optimize_for_inference
I haven’t found any mention of this anywhere online. It makes the graph serializations completely useless for inference.
Steps to reproduce:
- create graph that contains tf.contrib.layers.batch_norm with tf.bool tensor as is_training argument (to force use of Switch node
- run freeze_graph.freeze_graph and optimize_for_inference_lib.optimize_for_inference
- load resulting graph on Android via TensorFlowInferenceInterface
What happened:
ADB Logcat shows error message
E/TensorFlowInferenceInterface: Failed to load model from 'file:///android_asset/optimized_model.pb': java.io.IOException: Not a valid TensorFlow Graph serialization: Node 'conv1/bn1/BatchNorm/cond/AssignMovingAvg/BatchNorm/moving_mean/sub_1/x': Control dependencies must come after regular dependencies
Why did this happen: I found out that the order of dependencies was inconsistent after the processing.
Dependencies before processing:
input: "^conv1/bn1/BatchNorm/cond/AssignMovingAvg/BatchNorm/moving_mean/BatchNorm/BatchNorm/moving_mean"
input: "^conv1/bn1/BatchNorm/cond/AssignMovingAvg/BatchNorm/moving_mean/AssignAdd"
input: "^conv1/bn1/BatchNorm/cond/switch_t"
Dependencies after processing:
input: "^conv1/bn1/BatchNorm/cond/AssignMovingAvg/BatchNorm/moving_mean/BatchNorm/BatchNorm/moving_mean"
input: "^conv1/bn1/BatchNorm/cond/AssignMovingAvg/BatchNorm/moving_mean/AssignAdd"
input: "conv1/bn1/BatchNorm/cond/Switch:1"
What is wrong: The control dependencies (starting with ‘^’) should be after the regular dependencies.
Expected behaviour: Reordering of dependencies to ensure ordering consistency.
Expected order of dependencies:
input: "conv1/bn1/BatchNorm/cond/Switch:1"
input: "^conv1/bn1/BatchNorm/cond/AssignMovingAvg/BatchNorm/moving_mean/BatchNorm/BatchNorm/moving_mean"
input: "^conv1/bn1/BatchNorm/cond/AssignMovingAvg/BatchNorm/moving_mean/AssignAdd"
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 34 (12 by maintainers)
Can you try using the new Graph Transform Tool approach to optimizing for inference? https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/#optimizing-for-deployment
I’m hoping to deprecate the old optimize_for_inference Python script soon, so it would be helpful to know if this works better.
@ronny3050 This arguably simple python script should work for you:
Feel free to ask if there are further issues!
and the winner is… bazel build -c opt --copt=“-DSELECTIVE_REGISTRATION” --copt=“-DSUPPORT_SELECTIVE_REGISTRATION” //tensorflow/contrib/android:libtensorflow_inference.so --crosstool_top=//external:android/crosstool --host_crosstool_top=@bazel_tools//tools/cpp:toolchain --cpu=armeabi-v7a
I get a 3.9 MB libtensorflow_inference.so that doesn’t crash the app because of tensorflow code (well, it crashes later on, but because of a bug in MY code, which is much less frustrating as I’m going to be able to fix it quick)
So, either the tenserflow developers intended to have 2 different flags : SELECTIVE_REGISTRATION for ops and SUPPORT_SELECTIVE_REGISTRATION for types and this is not documented either it is a bug in the tenserflow code
Please fix ! @petewarden @andrewharp
mmh managed to build the android app with make and found why it wouldn’t build with basel (used a wrong git clone command missing recurse subprojects).
Will now try to build the android app with Bazel and then with custom ops (selective) and kernels
manually replaced the placeholder with a constant op (potentially in a bad way), rerun through transform_graph -> no change to the Switch ops that are fed with a constant false.
That’s it… let’s build CPU:DT_BOOL for android… 😕
ok, built tensorflow from sources (30 minutes O.O) retrained my model (took mostly as long for 1 epoch as the prebuilt tensorflow binary, despite the added cpu instructions ? I didn’t add any optimisation flag (there is no documentation for that anyway)) built summarize_graph (it takes sooooooooooooooooooooo long to compile C++ 😕, like 8 minutes for a simple utility)
Also, tried this command
bazel build tensorflow/tools/graph_transforms:transform_graph bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ --in_graph=tensorflow_inception_graph.pb \ --out_graph=optimized_inception_graph.pb \ --inputs='Mul' \ --outputs='softmax' \ --transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
loaded the .pb file, exported as .pbtext to see the differences (went from 370kb to 29kb, nice). There should have been quite a lot of optimizations in there…
Yet, the DT_BOOL & keras leraning-phase stuff is still in there (obviously, because the keras_learning_phase placeholder hasn’t been replaced by a constant op)
And… looking at the doc for the transform_graph tools, I don’t see any transformation that allows one to replace a placeholder op (with a single bool) by a const op… 😕 sigh… Am I supposed to write a custom transform function for that ?
And if I do, will the switch op disappear with an opmtimising phase ?
/me begins to think that he’ll throw the towel and just build tensorflow for android with CPU:BOOL kernel (why is there support for GPU:BOOL and not for CPU:BOOL anyway ?)
I tried replacing a PlaceHolder node holding a boolean, namely
node { name: "dropout_1/keras_learning_phase" op: "Placeholder" attr { key: "dtype" value { type: DT_BOOL } } attr { key: "shape" value { shape { } } } }
with this node (not sure if it is correct)
node { name: "dropout_1/keras_learning_phase" op: "Const" attr { key: "dtype" value { type: DT_BOOL } } attr { key: "value" value { tensor { dtype: DT_BOOL tensor_shape { } bool_val: false } } } }
and then running it through
`gd = tf.GraphDef()
from google.protobuf import text_format with tf.gfile.FastGFile(self.exportPath+self.modelName+“Constant.pbtxt”, “r”) as f: text_format.Merge(f.read(), gd)
But, when freezing, I ended up with ValueError: graph_def is invalid at node u’dropout_1/cond/mul/y’: More inputs specified (‘dropout_1/cond/Switch:1’) than the op expects…
was my attempt correct ?
Next I’m going to try to build tenserflow with basel…I really need those graph transform tools
I happened to encounter the exact same issue (with DT_BOOL) crashing my app at run time. Been trying to work around that for 2 days now (trying to remove keras_learning_phase Switch node branch)… this is frustrating
I had to manually turn Switch ops into Identity ops. Effectively,
is_training
is now permanentlyFalse
. Seems to be issue #6124, maybe related to #5919@petewarden Okay, I have used the new Graph Transform Tool with arguments:
--inputs='x' --outputs='y_conv' --transforms='strip_unused_nodes(type=float, shape="1,49,257,1") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
and all of the previous error messages are gone. The serialization works just fine now, but when trying to run the model: