tensorflow: Freeze_graph results in very poor accuracy compared to manually exporting and freezing the graph?

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04 LTS
TensorFlow installed from (source or binary): Source
TensorFlow version (use command below): 1.1
Bazel version (if compiling from source): 4.5
CUDA/cuDNN version: 8.0/5.1
GPU model and memory: GTX 860M

Describe the problem

From using the official freeze_graph.py file from TF, I am getting a very low accuracy in prediction as compared to manually exporting the graph using a file I wrote called write_pb.py, I get a much higher accuracy.

To be specific, here are the differences:

Differences:

Using write_pb.py to manually export the graph converted way more variables to constants, even with the same checkpoint files.
It takes a long, long time for freeze_graph.py to actually complete the exporting.
Very importantly: I get a very low accuracy from using freeze_graph. Meanwhile, by exporting the graph manually, I get a nearly identical accuracy as if I predicted an image right from the checkpoint without freezing.
Manually exporting the graph results in a smaller file size (just 1-2MB of difference).
The manually exported graph has a faster inference time than the graph obtained from freeze_graph.py.

This is the freeze_graph.py file I got from the TF repo: https://gist.github.com/kwotsin/b9dae8246a30371a1a10690e2fa27cb7

This is the write_pb file I wrote: https://gist.github.com/kwotsin/8e43f5db4815e1f1af37da70d0933d8b

Source code / logs

Using freeze_graph with this command: python freeze_graph.py --input_checkpoint=model.ckpt-18279 --input_graph=graph.pbtxt --output_graph=frozen_model_from_freeze_graph.pb --output_node_names=InceptionResnetV2/Logits/Predictions

I get this output:

2017-05-06 22:27:22.967898: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-05-06 22:27:22.968268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 860M
major: 5 minor: 0 memoryClockRate (GHz) 1.0195
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 3.34GiB
2017-05-06 22:27:22.968283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-05-06 22:27:22.968288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-05-06 22:27:22.968306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 860M, pci bus id: 0000:01:00.0)
2017-05-06 22:27:33.499491: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-05-06 22:27:33.499519: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 8 visible devices
2017-05-06 22:27:33.509659: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0x36ced220 executing computations on platform Host. Devices:
2017-05-06 22:27:33.509678: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): <undefined>, <undefined>
2017-05-06 22:27:33.509827: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-05-06 22:27:33.509837: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 8 visible devices
2017-05-06 22:27:33.512265: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0x36ce6210 executing computations on platform CUDA. Devices:
2017-05-06 22:27:33.512279: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): GeForce GTX 860M, Compute Capability 5.0
Converted 490 variables to const ops.
7871 ops in the final graph.

Meanwhile, if I manually freeze the graph using write_pb.py,

I get this output:

2017-05-06 22:39:00.197711: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-05-06 22:39:00.198096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 860M
major: 5 minor: 0 memoryClockRate (GHz) 1.0195
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 3.29GiB
2017-05-06 22:39:00.198120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-05-06 22:39:00.198125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-05-06 22:39:00.198143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 860M, pci bus id: 0000:01:00.0)
2017-05-06 22:39:00.709417: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-05-06 22:39:00.709452: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 8 visible devices
2017-05-06 22:39:00.710238: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0xef16320 executing computations on platform Host. Devices:
2017-05-06 22:39:00.710252: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): <undefined>, <undefined>
2017-05-06 22:39:00.710374: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-05-06 22:39:00.710384: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 8 visible devices
2017-05-06 22:39:00.710645: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0xef18180 executing computations on platform CUDA. Devices:
2017-05-06 22:39:00.710654: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): GeForce GTX 860M, Compute Capability 5.0
Exporting graph...
Converted 898 variables to const ops.

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 3
Comments: 15 (3 by maintainers)

Most upvoted comments

I had the exact same issue and have been hopelessly confused about this, thinking that the freeze_graph.py script couldn’t be the breaking point.

In my case, fine-tuning off MobileNet, the accuracy dropped to the abyss and the network outputted form freeze_graph.py was extremely degenerate – tiny variations in the input image would lead to different top classes being predicted. I printed and various weights and they matched.

I confirmed @petewarden’s suspicion that if a network exported directly (like in @kwotsin’s script) works iff is_training=False is passed. I haven’t debugged further, but even if this doesn’t have much to do with the freeze_graph.py script itself, some docs should probably be adjusted.

CatalinVoss on Jul 17, 2017

Thank you @ktiwary2 for your solution. I still have difficulty to find proper output nodes of SSD-Mobilenet when I loaded it from checkpoints. There are lots of nodes there 5000+ which I cant check them one by one. I checked some 1000+ last layers, but could not found. Any idea for this? -Thankyou

I would use the summarize_graph feature in graph transforms. Here’s a link. it will basically give you all possible input/output nodes.

ktiwary2 on Nov 13, 2018

@kwotsin Thank you ! Your code is also helpful for me, it works.

xiusir on Jun 22, 2017