tensorflow: Toco/TFLite_Convert for TFLite Problem

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Pixel 1
  • TensorFlow installed from (source or binary): Source
  • TensorFlow version (use command below): 1.9.0 (commit r1.9)
  • Python version: N/A
  • Bazel version (if compiling from source): 0.16.1
  • GCC/Compiler version (if compiling from source): gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A
  • Exact command to reproduce:

bazel run -c opt tensorflow/python/tools/optimize_for_inference -- --input=$ORIGINAL_PB --output=$STRIPPED_PB --frozen_graph=True --input_names=Preprocessor/sub --output_names=concat,concat_1 --alsologtostderr

bazel run tensorflow/contrib/lite/toco:toco -- --input_file=$STRIPPED_PB --output_file=/absolute/path/to/tensorflow/tensorflow/contrib/lite/examples/android/assets/new_model.tflite --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE --input_shapes=1,300,300,3 --input_arrays=Preprocessor/sub --output_arrays=concat,concat_1 --inference_type=QUANTIZED_UINT8 --logtostderr --default_ranges_min=0 --default_ranges_max=5 --mean_values=128 --std_values=127 --allow_custom_opps

or

bazel run //tensorflow/contrib/lite/python:tflite_convert -- --graph_def_file=$STRIPPED_PB --output_file=/absolute/path/to/tensorflow/tensorflow/contrib/lite/examples/android/assets/tflite_convert_example.tflite --input_arrays=Preprocessor/sub --output_arrays=concat,concat_1 --output_format=TFLITE --input_shapes=1,300,300,3 --inference_type=QUANTIZED_UINT8 --default_ranges_min=0 --default_ranges_max=5 --mean_values=128 --std_dev_values=127 --allow_custom_opps (which fails)

or

bazel run //tensorflow/contrib/lite/python:tflite_convert -- --graph_def_file=$STRIPPED_PB --output_file=/absolute/path/to/tensorflow/tensorflow/contrib/lite/examples/android/assets/tflite_convert_example.tflite --input_arrays=Preprocessor/sub --output_arrays=concat,concat_1 --input_shapes=1,300,300,3 (which succeeds)

  1. Change TF_OD_API_MODEL_FILE and append new file to the assets list in BUILD

bazel build -c opt --cxxopt='--std=c++11' //tensorflow/contrib/lite/examples/android:tflite_demo

adb install -r -f bazel-bin/tensorflow/contrib/lite/examples/android/tflite_demo.apk

  1. Open TFL Detect

Describe the problem

I’m attempting to import ssd_mobilenet_v1, ssd_mobilenet_v2 and ssdlite from the (model zoo)[https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md] into the TFLite Android example. Ultimately I’m aiming to retrain either the ssdlite or ssd_mobilenet_v2 models, but for right now all models I use trigger runtime errors. All of the errors imply that the models are changed by the optimize_for_inference and toco/tflite_convert commands in a way that makes them incompatible with r1.9.

Now, it’s most likely that my command for toco/tflite_convert are to blame, but since these commands seem to be well formed I’m elevating this to github.

Source code / logs

Firstly, according to (the toco documentation)[https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/cmdline_examples.md] we’re only supposed to use tflite_convert once we’re in r1.9. When I try to actually specify all of the fields that the command has in the help (aka the tflite_convert command I put in above) I get the following log and no file is produced:

 WARNING: /home/bryan/.cache/bazel/_bazel_bryan/36e77cb69a80a4f75d1ba8b192f69b6d/external/grpc/BUILD:1960:1: in srcs attribute of cc_library rule @grpc//:grpc_nanopb: please do not import '@grpc//third_party/nanopb:pb_common.c' directly. You should either move the file to this package or depend on an appropriate rule there. Since this rule was created by the macro 'grpc_generate_one_off_targets', the error might have been caused by the macro implementation in /home/bryan/.cache/bazel/_bazel_bryan/36e77cb69a80a4f75d1ba8b192f69b6d/external/grpc/bazel/grpc_build_system.bzl:172:12
 WARNING: /home/bryan/.cache/bazel/_bazel_bryan/36e77cb69a80a4f75d1ba8b192f69b6d/external/grpc/BUILD:1960:1: in srcs attribute of cc_library rule @grpc//:grpc_nanopb: please do not import '@grpc//third_party/nanopb:pb_decode.c' directly. You should either move the file to this package or depend on an appropriate rule there. Since this rule was created by the macro 'grpc_generate_one_off_targets', the error might have been caused by the macro implementation in /home/bryan/.cache/bazel/_bazel_bryan/36e77cb69a80a4f75d1ba8b192f69b6d/external/grpc/bazel/grpc_build_system.bzl:172:12
 WARNING: /home/bryan/.cache/bazel/_bazel_bryan/36e77cb69a80a4f75d1ba8b192f69b6d/external/grpc/BUILD:1960:1: in srcs attribute of cc_library rule @grpc//:grpc_nanopb: please do not import '@grpc//third_party/nanopb:pb_encode.c' directly. You should either move the file to this package or depend on an appropriate rule there. Since this rule was created by the macro 'grpc_generate_one_off_targets', the error might have been caused by the macro implementation in /home/bryan/.cache/bazel/_bazel_bryan/36e77cb69a80a4f75d1ba8b192f69b6d/external/grpc/bazel/grpc_build_system.bzl:172:12
 INFO: Analysed target //tensorflow/contrib/lite/python:tflite_convert (0 packages loaded).
 INFO: Found 1 target...
 Target //tensorflow/contrib/lite/python:tflite_convert up-to-date:
   bazel-bin/tensorflow/contrib/lite/python/tflite_convert
 INFO: Elapsed time: 0.254s, Critical Path: 0.00s
 INFO: 0 processes.
 INFO: Build completed successfully, 1 total action
 INFO: Running command line: bazel-bin/tensorflow/contrib/lite/python/tflite_convert '--graph_def_file=/home/bryan/Downloads/ssd_mobilenet_v1_coco_2018_01_28/stripped' '--output_file=/home/bryan/Support/tensorflow/tensorflow/contrib/lite/examples/android/assets/tflite_convert_example.tflite' '--input_arrays=Preprocessor/sub' '--output_arrays=concat,concat_1' '--output_format=TFLITE' '--input_shapes=1,300,3INFO: Build completed successfully, 1 total action
 /home/bryan/.local/lib/python2.7/site-packages/scipy/__init__.py:114: UserWarning: Numpy 1.8.2 or above is recommended for this version of scipy (detected version 1.8.0)
   UserWarning)
 usage: tflite_convert.py [-h] --output_file OUTPUT_FILE
                          (--graph_def_file GRAPH_DEF_FILE | --saved_model_dir SAVED_MODEL_DIR)
                          [--output_format {TFLITE,GRAPHVIZ_DOT}]
                          [--inference_type {FLOAT,QUANTIZED_UINT8}]
                          [--inference_input_type {FLOAT,QUANTIZED_UINT8}]
                          [--input_arrays INPUT_ARRAYS]
                          [--input_shapes INPUT_SHAPES]
                          [--output_arrays OUTPUT_ARRAYS]
                          [--saved_model_tag_set SAVED_MODEL_TAG_SET]
                          [--saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
                          [--std_dev_values STD_DEV_VALUES]
                          [--mean_values MEAN_VALUES]
                          [--default_ranges_min DEFAULT_RANGES_MIN]
                          [--default_ranges_max DEFAULT_RANGES_MAX]
                          [--drop_control_dependency DROP_CONTROL_DEPENDENCY]
                          [--reorder_across_fake_quant REORDER_ACROSS_FAKE_QUANT]
                          [--change_concat_input_ranges CHANGE_CONCAT_INPUT_RANGES]
                          [--allow_custom_ops ALLOW_CUSTOM_OPS]
 tflite_convert.py: error:

When I strip the tflite_convert params to just include the bare minimum (graph_def_file, output_file, input_arrays, output_arrays, input_shapes) it does create an unquantized tflite model. When I load an unquantized tflite model generated with either command, TFL Detect exits with the following log:

09-06 00:00:51.046 25024 25041 E AndroidRuntime: FATAL EXCEPTION: inference
09-06 00:00:51.046 25024 25041 E AndroidRuntime: Process: org.tensorflow.lite.demo, PID: 25024
09-06 00:00:51.046 25024 25041 E AndroidRuntime: java.lang.IllegalArgumentException: Output error: Shape of output target [1, 1917, 4] does not match with the shape of the Tensor [1, 1917, 1, 4].
09-06 00:00:51.046 25024 25041 E AndroidRuntime: 	at org.tensorflow.lite.Tensor.copyTo(Tensor.java:44)
09-06 00:00:51.046 25024 25041 E AndroidRuntime: 	at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:156)
09-06 00:00:51.046 25024 25041 E AndroidRuntime: 	at org.tensorflow.demo.TFLiteObjectDetectionAPIModel.recognizeImage(TFLiteObjectDetectionAPIModel.java:222)
09-06 00:00:51.046 25024 25041 E AndroidRuntime: 	at org.tensorflow.demo.DetectorActivity$3.run(DetectorActivity.java:242)
09-06 00:00:51.046 25024 25041 E AndroidRuntime: 	at android.os.Handler.handleCallback(Handler.java:873)
09-06 00:00:51.046 25024 25041 E AndroidRuntime: 	at android.os.Handler.dispatchMessage(Handler.java:99)
09-06 00:00:51.046 25024 25041 E AndroidRuntime: 	at android.os.Looper.loop(Looper.java:193)
09-06 00:00:51.046 25024 25041 E AndroidRuntime: 	at android.os.HandlerThread.run(HandlerThread.java:65)
09-06 00:00:51.049   914  2995 W ActivityManager:   Force finishing activity org.tensorflow.lite.demo/org.tensorflow.demo.DetectorActivity

The closest I’ve found as a solution is in (this stackoverflow page)[https://stackoverflow.com/questions/50388330/java-lang-illegalargumentexception-output-error-shape-of-output-target-1-191] which suggests modifying the TFLiteObjectDetectionAPIModel itself (which runs into problems when you get similar errors on the outputClassification array).

If we use a quantized model, it crashes with this error:

09-05 22:54:27.413 21650 21667 E AndroidRuntime: java.lang.IllegalArgumentException: Input error: DataType (1) of input data does not match with the DataType (3) of model inputs.
09-05 22:54:27.413 21650 21667 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.run(Native Method)
09-05 22:54:27.413 21650 21667 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:123)
09-05 22:54:27.413 21650 21667 E AndroidRuntime: 	at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:144)
09-05 22:54:27.413 21650 21667 E AndroidRuntime: 	at org.tensorflow.demo.TFLiteObjectDetectionAPIModel.recognizeImage(TFLiteObjectDetectionAPIModel.java:222)
09-05 22:54:27.413 21650 21667 E AndroidRuntime: 	at org.tensorflow.demo.DetectorActivity$3.run(DetectorActivity.java:242)

Given similar error messages in (this test)[https://github.com/OAID/TensorFlow-HRT/blob/master/tensorflow/contrib/lite/java/src/test/java/org/tensorflow/lite/NativeInterpreterWrapperTest.java#L228] and how commits after r1.9 give the TFLiteObjectDetectionAPIModel class an isQuantized flag this makes me think that r1.9 may not support quantization. If so, is there a definitive source for this? There are several sources that are imperfect in different ways (for the most official sources (fixed point quantization)[https://www.tensorflow.org/performance/quantization] page seems geared towards classification instead of the object detection, the required output arrays in the (medium page)[https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193] are not found when we run toco, and both of them are supposedly out of date because of (the toco documentation)[https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/cmdline_examples.md]).

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 33 (4 by maintainers)

Most upvoted comments

So as a follow up I was able to deal with this issue. For anyone wondering here’s what I did:

  1. I checked out r1.10 for both the models and tensorflow repos
  2. Used bazel to clean the tensorflow repo, that way whenever I use bazel commands we’ll use the r1.10 binaries. This by far is most likely what solved my problem, since I was trying different versions of tensorflow as I was dealing with this issue.
  3. I trained using a command similar to this:
python ~/tensorflow/models/research/object_detection/model_main.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
       --model_dir=${MODEL_DIR} \
       --num_train_steps=${NUM_TRAIN_STEPS} \
       --num_eval_steps=${NUM_EVAL_STEPS} \
       --alsologtostderr
  1. For specifically tflite I needed to use export_tflite_ssd_graph.py, not export_inference_graph. So the next command was something like:
python ~/tensorflow/models/research/object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path=$CONFIG_FILE \
--trained_checkpoint_prefix=$CHECKPOINT_PATH \
--output_directory=$EXPORT_OUTPUT_DIR \
--add_postprocessing_op=true
  1. Then we have the toco command. Similar to the blog, but I needed to add a few parameters:
./bazel-bin/tensorflow/contrib/lite/toco/toco \
  --input_file=$INPUT_PB_GRAPH \
  --output_file=$OUTPUT_TFLITE_FILE \
  --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE \
  --inference_type=QUANTIZED_UINT8 \
  --input_shapes="1,300, 300,3" \
  --input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
  --std_values=128.0 --mean_values=128.0 \
  --allow_custom_ops --default_ranges_min=0 --default_ranges_max=6
  1. Then when loading into the tflite example (so tensorflow/tensorflow/contrib/lite/examples/android) I needed some changes to compile or get past runtime errors and other behaviour:
git diff tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java
diff --git a/tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java b/tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java
index 9eb21de..2cfa7e0 100644
--- a/tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java
+++ b/tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java
@@ -208,17 +208,24 @@ public class TFLiteObjectDetectionAPIModel implements Classifier {
       // in label file and class labels start from 1 to number_of_classes+1,
       // while outputClasses correspond to class index from 0 to number_of_classes
       int labelOffset = 1;
-      recognitions.add(
-          new Recognition(
-              "" + i,
-              labels.get((int) outputClasses[0][i] + labelOffset),
-              outputScores[0][i],
-              detection));
+        final int classLabel = (int) outputClasses[0][i] + labelOffset;
+        if (inRange(classLabel, labels.size(), 0) && inRange(outputScores[0][i], 1, 0)) {
+            recognitions.add(
+                    new Recognition(
+                            "" + i,
+                            labels.get(classLabel),
+                            outputScores[0][i],
+                            detection));
+        }
     }
     Trace.endSection(); // "recognizeImage"
     return recognitions;
   }
 
+  private boolean inRange(float number, float max, float min) {
+    return number < max && number >= min;
+  }
+

And then I was able to run the tflite example on my phone! Thanks to @achowdhery and @jdduke for responding and the help!

While following the instructions here, I was getting this error when trying to run the bazel command: Specified output array "'TFLite_Detection_PostProcess'" is not produced by any op in this graph

I was able to resolve it by removing the ’ characters around each of the TFLite_Detection_PostProcess entries in output_arrays portion of the command, like so:

 --output_arrays=TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3

I am doing this on Windows, so it may just be a weird way of how the command line interprets string characters.

So as a follow up I was able to deal with this issue. For anyone wondering here’s what I did:

  1. I checked out r1.10 for both the models and tensorflow repos
  2. Used bazel to clean the tensorflow repo, that way whenever I use bazel commands we’ll use the r1.10 binaries. This by far is most likely what solved my problem, since I was trying different versions of tensorflow as I was dealing with this issue.
  3. I trained using a command similar to this:
python ~/tensorflow/models/research/object_detection/model_main.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
       --model_dir=${MODEL_DIR} \
       --num_train_steps=${NUM_TRAIN_STEPS} \
       --num_eval_steps=${NUM_EVAL_STEPS} \
       --alsologtostderr
  1. For specifically tflite I needed to use export_tflite_ssd_graph.py, not export_inference_graph. So the next command was something like:
python ~/tensorflow/models/research/object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path=$CONFIG_FILE \
--trained_checkpoint_prefix=$CHECKPOINT_PATH \
--output_directory=$EXPORT_OUTPUT_DIR \
--add_postprocessing_op=true
  1. Then we have the toco command. Similar to the blog, but I needed to add a few parameters:
./bazel-bin/tensorflow/contrib/lite/toco/toco \
  --input_file=$INPUT_PB_GRAPH \
  --output_file=$OUTPUT_TFLITE_FILE \
  --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE \
  --inference_type=QUANTIZED_UINT8 \
  --input_shapes="1,300, 300,3" \
  --input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
  --std_values=128.0 --mean_values=128.0 \
  --allow_custom_ops --default_ranges_min=0 --default_ranges_max=6
  1. Then when loading into the tflite example (so tensorflow/tensorflow/contrib/lite/examples/android) I needed some changes to compile or get past runtime errors and other behaviour:
git diff tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java
diff --git a/tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java b/tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java
index 9eb21de..2cfa7e0 100644
--- a/tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java
+++ b/tensorflow/contrib/lite/examples/android/app/src/main/java/org/tensorflow/demo/TFLiteObjectDetectionAPIModel.java
@@ -208,17 +208,24 @@ public class TFLiteObjectDetectionAPIModel implements Classifier {
       // in label file and class labels start from 1 to number_of_classes+1,
       // while outputClasses correspond to class index from 0 to number_of_classes
       int labelOffset = 1;
-      recognitions.add(
-          new Recognition(
-              "" + i,
-              labels.get((int) outputClasses[0][i] + labelOffset),
-              outputScores[0][i],
-              detection));
+        final int classLabel = (int) outputClasses[0][i] + labelOffset;
+        if (inRange(classLabel, labels.size(), 0) && inRange(outputScores[0][i], 1, 0)) {
+            recognitions.add(
+                    new Recognition(
+                            "" + i,
+                            labels.get(classLabel),
+                            outputScores[0][i],
+                            detection));
+        }
     }
     Trace.endSection(); // "recognizeImage"
     return recognitions;
   }
 
+  private boolean inRange(float number, float max, float min) {
+    return number < max && number >= min;
+  }
+

And then I was able to run the tflite example on my phone! Thanks to @achowdhery and @jdduke for responding and the help!

Thank you Bryan! Spend me 2 days to find your post! I tried many solutions but just cannot get over the [1,10,4] input tensor of the Tensorflow lite Android OD example. For the SSD_MobileNet_quatiazed_v2_coco model (that is my case), it should be ’ int labelOffset = 0’. And, good to see you here Evan. 💯 @EdjeElectronics

Even if I prevent this error by using inRange(), the detection results are full of false positives. Any one can explain why this would produce weirdly out of bound indexes.