tensorflow: Unable to import frozen graph with batchnorm
Error when loading the frozen graph with tensorflow.contrib.layers.python.layers.batch_norm
ValueError: graph_def is invalid at node u'BatchNorm/cond/AssignMovingAvg/Switch': Input tensor 'BatchNorm/moving_mean:0' Cannot convert a tensor of type float32 to an input of type float32_ref
freeze_graph.py doesn’t seem to store moving_mean and moving_variance properly
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 36
- Comments: 80 (28 by maintainers)
The full script I use to convert a checkpoint model to a protobuf graph is below, in case more people using batch norm layers find it useful.
@petewarden this is still a problem. severely limits the ability to put models in production with batch norm (which are most models…)
yeah , Fuck tensorflow !!!
@pavelgonchar This has worked for me:
I’ve only changed the inputs related to the “moving_variance” and “moving_mean”.
fix batch norm nodes
i add “if ‘moving_’ in node.input[index] and “Switch” not in node.input[index]:” and i have solved my problem, thanks!
The workaround of @barbolo worked for me (for python 3 change
xrangetorange. But would be nice if native tensorflow would allow to freeze a graph with batch norm without these kind of workarounds!This is hitting me too; it’s a bad bug that makes it hard to use BatchNorm in production settings.
I found another work-around for this. Our implementation of batch norm was using tf.cond() to distinguish between training-time and test-time behavior. At training time, the variables in batch norm have to be updated. This causes an error when those variables are converted to constants.
When freezing a graph for inference only, the update operations are still present in the frozen graph because tf.cond() chooses the behavior at run-time, not compile-time. The easiest solution for me was to generate two graphs that share all of their variables, one for training and one for testing. This way you can eliminate the call to tf.cond() and distinguish behaviors at compile time. Then Tensorflow correctly removes all the update operations when calling tf.graph_util.convert_variables_to_constants() on the inference output.
As other users have pointed out, one could also fix this using the ‘blacklist’ option in tf.graph_util.convert_variables_to_constants(). The downside of this is that unneeded ops are still present in the frozen graph.
Is there going to be a more comprehensive patch for this any time soon? I am surprised this issue has been open for almost a year with no action. This seems like a very big issue, since batch norm is so useful for training large, complex networks. It is not always practical to re-train the network without batch norm for deployment. Users should not be relying on hacks that edit the graph after the fact.
@pavelgonchar Your suggestion didn’t work for me:
Error:
I solved this problem by using the tf.layers.batch_normalization rather the tf.contrib.layers.batch_norm.
@XiaodanLi001 since I started using TensorFlow Estimator API I never had this problem again. However, there is still a trick. You do must define your training op within
tf.control_dependenciescontext, like the snippet bellow, because although the batch_norm layers allocates some variables, they are not trainable, but updatable. So you do need to ensure that they are being updated at every training step. I hope it helpsOBS: I use
tf.layerstoo.Bump
Same problem I trined GooglNet model and get the error when importing a frozen graph
ValueError: Input 0 of node save/Assign_41 was passed float from auxiliary_classifier_1/classifier/biases/Adam_1:0 incompatible with expected float_ref.None of the answers I found on the net do not answer this specific situation, where the problem is in the save op and Adam optimizer
Closing since @barbolo 's solution seems to work.
In general the best route is to create a separate eval graph with is_training=False for batchnorm, freeze the training checkpoint into that graph.
Thanks!
@barbolo your answer works for me!!! you really save my project… so many thanks!
I encountered an error
ValueError: graph_def is invalid at node 'conv3_1/BatchNorm/AssignMovingAvg': Input tensor 'conv3_1/BatchNorm/moving_mean:0' Cannot convert a tensor of type float32 to an input of type float32_ref.when using BatchNorm layer in slim. Not sure how to solve it…I meet the same problem when deeling with BN in resnet, it seems that the function
inference_graph = extract_sub_graph(input_graph_def, output_node_names)ingraph_util.convert_variables_to_constants()cuts off the node of moving_mean and moving_variance, I solve it in this way: save the model in train mode, then load the model in eval or test mode, run and save again, freeze the last model, you will find the moving_mean and moving_variance node. BTW, my goal is to get the variables such as mean, variance, and weight in the frozen model, I don’t need to load it again.@drpngx @gunan - I think the state machine got confused here. @barbolo replied with his version but no tensorflower picked up the ball.