model-optimization: Failed with Post Training Quantization after Quantization Aware Training
Describe the requests I am working with recent neural networks targeting mobile devices, and I found there are obstacles to perform integer-quantization after QAT.
I know these APIs are not available now, but if you have plans to address following issues, please let me know when they will be available 😃
- AveragePooling2D
x = layers.Conv2D(32, 5, padding='same', activation='relu')(input)
x = layers.AveragePooling2D((2, 2), (2, 2), padding='same')(x) #<- succeed to convert, failed to prepare
x = layers.Conv2D(64, 5, padding='same', activation='relu')(x)
tensorflow/lite/kernels/pooling.cc:94 input->params.scale != output->params.scale (-1045139600 != 653455232) Node number 2 (AVERAGE_POOL_2D) failed to prepare.
- Same with
MaxPooling2Dproblem.
- MaxPooling2D
x = layers.Conv2D(32, 5, padding='same', activation='relu')(input)
x = layers.MaxPooling2D((2, 2), (2, 2), padding='same')(x) #<- succeed to convert, failed to prepare
x = layers.Conv2D(64, 5, padding='same', activation='relu')(x)
tensorflow/lite/kernels/pooling.cc:94 input->params.scale != output->params.scale (-1045139600 != 653454832) Node number 2 (MAX_POOL_2D) failed to prepare.
- Same with
AveragePooling2Dproblem.
- Residual connection
input = tf.keras.Input(input_shape)
shortcut = input
x = layers.Conv2D(16, 1, padding='same', use_bias=False)(input)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x)
x = x + shortcut #<- failed to convert addition because '+' reduced to TensorFlowOpLayer, not Add.
Layer tf_op_layer_AddV2:<class ‘tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer’> is not supported. You can quantize this layer by passing a
tfmot.quantization.keras.QuantizeConfiginstance to thequantize_annotate_layerAPI.
- This problem cause below failure.
- HardSwish
x = layers.Conv2D(32, 3, 2, padding='same', use_bias=False)(input)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x + 3) * (1 / 6) #<- equivalent to `HardSwish`
Layer tf_op_layer_AddV2_1:<class ‘tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer’> is not supported. You can quantize this layer by passing a
tfmot.quantization.keras.QuantizeConfiginstance to thequantize_annotate_layerAPI.
- There are two levels of the problem.
- I had configured
QuantizeConfigto supportTensorFlowOpLayerto useAddandMultiplyops, however these ops are placed between BN and ReLU6, Conv2D-BN-ReLU layers could not be fused correctly. -> Quantized MobileNetV3 became slower than floating pointer version on the android device. - Main building block of MobileNetV3: Conv2D-BN-HardSwish is not supported pattern.
- I had configured
- GlobalAveragePooling-Dense
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(1024, activation='relu')(x) #<- succeed to convert, failed to prepare
tensorflow/lite/kernels/kernel_util.cc:129 std::abs(input_product_scale - bias_scale) <= 1e-6 * std::min(input_product_scale, bias_scale) was not true. Node number 4 (FULLY_CONNECTED) failed to prepare.
- This bug prevent me from benchmark official MobileNetV2 network imported from tf.keras.
System information
TensorFlow installed from (source or binary): binary
TensorFlow version: 2.2.0 (release)
TensorFlow Model Optimization version: 0.3.0 (release)
Python version: 3.6.0
Code to reproduce the issue Gist to reproduce full test https://gist.github.com/kalaluthien/b270c71afb6866ae61ef0dc088a762f2
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 15 (8 by maintainers)
Also, regarding HardSwish if you have the time and are interested, I’m happy to guide you in how to implement support for it 😃
Regarding MobileNetV2 reproduction, looking at your code it seems you are training on CIFAR. It won’t be as straight-forward to reproduce the full training.
We trained a Keras MobileNet V2 model with hyperparams from this. We then quantized the model and trained again for a few epochs.
I think the reason your conversion code is failing is due to
Try removing it, and I think conversion should work. If it doesn’t, please let me know. Basically, the QAT conversion by default uses Float inputs/outputs based on the model signature. There is work in progress in
TFLiteConverterV2to support a different model interfaceint8/uint8etc.See this.
Hope this helps.
Hi @kalaluthien,
Thanks for the well thought out and detailed bug report. Sorry for the delay in getting back - there’s generally limited time I take out each week to look at github issues 😃
I tried out both these example and they converted+ran just fine for me. Perhaps, there is a flag you are missing during conversion. Check this file for conversion code.
Looking at your colab code,
converter._experimental_new_quantizer = Trueis missing. Please try that and let me know how it goes.This failure is expected. By default, our goal is to support built-in keras layers which is basically layers under the
tf.keras.layersmodule. TensorflowOpLayer can be used to wrap any TF op, and it’s not feasible to meaningfully cover any tf op.The recommended approach here is to use built-in Keras layers to achieve this. So you can use
tf.keras.layers.Addandtf.keras.layers.Reshapeinstead of using+andexpand_dims. That should solve it.If you really do want to use something else, it’s the user’s responsibility to provide an appropriate
QuantizeConfigfor your use.This is again the same problem as TensorFlowOpLayer. And yes, you are right the existing pattern only matches Conv+BN+ReLU. The code likely became slow since it had added a bunch of Quant/Dequant ops in between. I don’t think the converter is likely to match Conv/BN/(Add+Mul ops matching hardswish) either while folding.
The proper fix here would be to add support for hardswish. Can you please file a separate bug for that requesting HardSwish support. I’ll take some time out to add it. We covered MobileNet v1/v2, so this is currently missing.
But we should be able to add support for this. We also need to ensure the converter is handling it properly.
Again, this works for me. That’s how we got MobileNetV2 working and created the results. Perhaps, this is the same issue as Averge/MaxPooling. Please try the fix out and let me know if that works.
Seems like this bug is solved. I’m closing it, please feel free to reopen.
You can start a new issue for the hardswish and we can continue our conversation there. Even if it’s done in your code, it can remain an example for others to follow.
And it’ll be pretty easy to incorporate into the library once you’ve implemented it. We can try and get HardSwish moved into Keras.
Thanks @kalaluthien for your patience and proactive use of the library.
As for HardSwish, I just looked into it a bit. There seem to be a few tricky pieces.
For starters,
hard_swishhas not been added as an activation in Keras yet. The goal of thetfmotlibrary is to provide default behavior for all built-in Keras layers/activations. But sincehard_swishis not a built-in activation yet, we can’t really add a pattern matching it in the library code. It would need to be handled by the user.I would recommend adding support for it in your code to begin with. Once
hard_swishgets added, we can move this code internally. You should be able to file a bug on keras/tf to check whether they plan to add support for it.You can create a
class HardSwish(Layer)which gets added afterConv + BN. You should be able to use built-inAddandMultiplyto do so.Next, to understand exactly what support needs to be added, we would need to understand how it executes in TFLite.
I created a simple model.
Converted it using the following code.
As you can see the converter fuses the Add into the bias, but the Mul comes after.
So you’ll need to place the
FakeQuantafter theAdd + ReLUbut before the Mul. And likely use a transform similar to this. That should sort the issue out.Oh I’m sorry I made a mistake. I meant use
converter.experimental_new_converter = TrueThat’s what was missing.