edgetpu: How to overcome `More than one subgraph is not supported`

Inference on Edge TPU is very slow if the operations are not mapped to TPU. One of the main reasons for not mapping on TPU is given to be More than one subgraph is not supported. For example, I am trying to convert a U-Net type architecture to TFLite. I noticed that the operations on the encoder are mapped on TPU but the ones on decoder are not.

converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file("model.h5")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.int8
converter.inference_output_type = tf.int8  # or tf.int8
converter.quantized_input_stats = {'serving_default_input' : (0., 1.),
                                  'input' : (0., 1.),
                                  'input_0' : (0., 1.)}
# converter.allow_custom_ops = True
converter.experimental_new_converter = False
tflite_model = converter.convert()

with tf.io.gfile.GFile('resnet18.tflite', 'wb') as f:
    f.write(tflite_model)

compiling the above converted model gives

Edge TPU Compiler version 15.0.340273435

Model compiled successfully in 2160 ms.

Input model: resnet18.tflite
Input size: 23.66MiB
Output model: resnet18_edgetpu.tflite
Output size: 23.57MiB
On-chip memory used for caching model parameters: 6.92MiB
On-chip memory remaining for caching model parameters: 3.50KiB
Off-chip memory used for streaming uncached model parameters: 13.46MiB
Number of Edge TPU subgraphs: 1
Total number of operations: 126
Operation log: resnet18_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 90
Number of operations that will run on CPU: 36

Operator                       Count      Status

RESIZE_BILINEAR                5          Operation version not supported
ADD                            16         Mapped to Edge TPU
MAX_POOL_2D                    1          Mapped to Edge TPU
CONCATENATION                  4          More than one subgraph is not supported
PAD                            11         More than one subgraph is not supported
PAD                            34         Mapped to Edge TPU
LOGISTIC                       1          More than one subgraph is not supported
QUANTIZE                       4          More than one subgraph is not supported
QUANTIZE                       3          Mapped to Edge TPU
CONV_2D                        11         More than one subgraph is not supported
CONV_2D                        36         Mapped to Edge TPU

Why are some Conv2D mapped on TPU while others are not? What does More than one subgraph is not supported mean? How to avoid this to make most of the ops map on TPU?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (1 by maintainers)

Most upvoted comments

@hjonnala Update: Hi Glenn, I retrained my model, and I changed the imgsz from 640 to 160. With an image size of 160, it seems everything is ok; I got the result same as you. Do you have any idea why with the image size of 640, I couldn’t get that?

Model compiled successfully in 2181 ms.

Input model: /content/drive/MyDrive/Yolo/best160_saved_model/best160_full_integer_quant.tflite
Input size: 2.96MiB
Output model: /content/drive/MyDrive/Yolo/best160_saved_model/best160_full_integer_quant_edgetpu.tflite
Output size: 4.92MiB
On-chip memory used for caching model parameters: 4.50MiB
On-chip memory remaining for caching model parameters: 3.08MiB
Off-chip memory used for streaming uncached model parameters: 9.31KiB
Number of Edge TPU subgraphs: 1
Total number of operations: 256
Operation log: /content/drive/MyDrive/Yolo/best160_saved_model/best160_full_integer_quant_edgetpu.log

Operator                       Count      Status

CONV_2D                        64         Mapped to Edge TPU
CONCATENATION                  19         Mapped to Edge TPU
SOFTMAX                        1          Mapped to Edge TPU
RESHAPE                        5          Mapped to Edge TPU
TRANSPOSE                      5          Mapped to Edge TPU
STRIDED_SLICE                  20         Mapped to Edge TPU
MUL                            59         Mapped to Edge TPU
SUB                            2          Mapped to Edge TPU
ADD                            8          Mapped to Edge TPU
LOGISTIC                       58         Mapped to Edge TPU
MAX_POOL_2D                    3          Mapped to Edge TPU
PAD                            7          Mapped to Edge TPU
RESIZE_NEAREST_NEIGHBOR        2          Mapped to Edge TPU
QUANTIZE                       3          Mapped to Edge TPU
Compilation child process completed within timeout period.
Compilation succeeded! 
Edge TPU: export success ✅ 14.6s, saved as /content/drive/MyDrive/Yolo/best160_saved_model/best160_full_integer_quant_edgetpu.tflite (4.9 MB)

Export complete (73.2s)
Results saved to /content/drive/MyDrive/Yolo
Predict:         yolo predict task=detect model=/content/drive/MyDrive/Yolo/best160_saved_model/best160_full_integer_quant_edgetpu.tflite imgsz=160 
Validate:        yolo val task=detect model=/content/drive/MyDrive/Yolo/best160_saved_model/best160_full_integer_quant_edgetpu.tflite imgsz=160 data=Data/data.yaml 
Visualize:       https://netron.app/

@manoj7410 I have also tried the 320x320 model and that model gets compiled by the edgetpu. I am even getting fast inference with the 320x320 model. But I am trying to compare performance across different models and so I was hoping I could also compile the 640x640 model using the edgetpu compiler.

Do you know why the 640x640 model is not getting compiled by using the -a flag on the edgetpu compiler? Or any suggestions on how I can debug it? I can compile the 640x640 model using the edgetpu without the -a flag however the inference is very slow.

@e2r-htz There is an experimental feature with our compiler that you can try. Basically add a -a to your compiling command: edgetpu_compiler -sa model.tflite Although please keep in mind that this is still in the experimentation stage

@Namburger I tried using the -a flag however I get an error saying:

Internal compiler error. Aborting!

I am trying to compile the ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 from the tensorflow object detection model zoo. I have trained this model on my own custom dataset and I have used the export_tflite_graph_tf2.py script to export my model. Without the -a flag, this is the compiler output that I get:

Edge TPU Compiler version 15.0.340273435

Model compiled successfully in 2045 ms.

Input model: ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8_full_integer.tflite
Input size: 4.17MiB
Output model: ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8_full_integer_edgetpu.tflite
Output size: 4.76MiB
On-chip memory used for caching model parameters: 3.21MiB
On-chip memory remaining for caching model parameters: 3.78MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 158
Operation log: ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8_full_integer_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 112
Number of operations that will run on CPU: 46

Operator                       Count      Status

PACK                           4          Tensor has unsupported rank (up to 3 innermost dimensions mapped)
CONCATENATION                  2          More than one subgraph is not supported
RESHAPE                        3          Operation is otherwise supported, but not mapped due to some unspecified limitation
RESHAPE                        3          More than one subgraph is not supported
RESHAPE                        6          Mapped to Edge TPU
LOGISTIC                       1          More than one subgraph is not supported
DEPTHWISE_CONV_2D              14         More than one subgraph is not supported
DEPTHWISE_CONV_2D              37         Mapped to Edge TPU
CONV_2D                        58         Mapped to Edge TPU
CONV_2D                        14         More than one subgraph is not supported
QUANTIZE                       1          Mapped to Edge TPU
DEQUANTIZE                     2          Operation is working on an unsupported data type
ADD                            10         Mapped to Edge TPU
ADD                            2          More than one subgraph is not supported
CUSTOM                         1          Operation is working on an unsupported data type