model-optimization: Failed with Post Training Quantization after Quantization Aware Training

Describe the requests I am working with recent neural networks targeting mobile devices, and I found there are obstacles to perform integer-quantization after QAT.

I know these APIs are not available now, but if you have plans to address following issues, please let me know when they will be available 😃

AveragePooling2D

x = layers.Conv2D(32, 5, padding='same', activation='relu')(input)
x = layers.AveragePooling2D((2, 2), (2, 2), padding='same')(x)  #<- succeed to convert, failed to prepare
x = layers.Conv2D(64, 5, padding='same', activation='relu')(x)

tensorflow/lite/kernels/pooling.cc:94 input->params.scale != output->params.scale (-1045139600 != 653455232) Node number 2 (AVERAGE_POOL_2D) failed to prepare.

Same with MaxPooling2D problem.

MaxPooling2D

x = layers.Conv2D(32, 5, padding='same', activation='relu')(input)
x = layers.MaxPooling2D((2, 2), (2, 2), padding='same')(x)  #<- succeed to convert, failed to prepare
x = layers.Conv2D(64, 5, padding='same', activation='relu')(x)

tensorflow/lite/kernels/pooling.cc:94 input->params.scale != output->params.scale (-1045139600 != 653454832) Node number 2 (MAX_POOL_2D) failed to prepare.

Same with AveragePooling2D problem.

Residual connection

input = tf.keras.Input(input_shape)
shortcut = input
x = layers.Conv2D(16, 1, padding='same', use_bias=False)(input)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x)
x = x + shortcut  #<- failed to convert addition because '+' reduced to TensorFlowOpLayer, not Add.

Layer tf_op_layer_AddV2:<class ‘tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer’> is not supported. You can quantize this layer by passing a tfmot.quantization.keras.QuantizeConfig instance to the quantize_annotate_layer API.

This problem cause below failure.

HardSwish

x = layers.Conv2D(32, 3, 2, padding='same', use_bias=False)(input)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x + 3) * (1 / 6)  #<- equivalent to `HardSwish`

Layer tf_op_layer_AddV2_1:<class ‘tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer’> is not supported. You can quantize this layer by passing a tfmot.quantization.keras.QuantizeConfig instance to the quantize_annotate_layer API.

There are two levels of the problem.
- I had configured QuantizeConfig to support TensorFlowOpLayer to use Add and Multiply ops, however these ops are placed between BN and ReLU6, Conv2D-BN-ReLU layers could not be fused correctly. -> Quantized MobileNetV3 became slower than floating pointer version on the android device.
- Main building block of MobileNetV3: Conv2D-BN-HardSwish is not supported pattern.

GlobalAveragePooling-Dense

x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(1024, activation='relu')(x)  #<- succeed to convert, failed to prepare

tensorflow/lite/kernels/kernel_util.cc:129 std::abs(input_product_scale - bias_scale) <= 1e-6 * std::min(input_product_scale, bias_scale) was not true. Node number 4 (FULLY_CONNECTED) failed to prepare.

This bug prevent me from benchmark official MobileNetV2 network imported from tf.keras.

System information

TensorFlow installed from (source or binary): binary

TensorFlow version: 2.2.0 (release)

TensorFlow Model Optimization version: 0.3.0 (release)

Python version: 3.6.0

Code to reproduce the issue Gist to reproduce full test https://gist.github.com/kalaluthien/b270c71afb6866ae61ef0dc088a762f2

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 3
Comments: 15 (8 by maintainers)

Most upvoted comments

Also, regarding HardSwish if you have the time and are interested, I’m happy to guide you in how to implement support for it 😃

nutsiepully on Jun 23, 2020

Regarding MobileNetV2 reproduction, looking at your code it seems you are training on CIFAR. It won’t be as straight-forward to reproduce the full training.

We trained a Keras MobileNet V2 model with hyperparams from this. We then quantized the model and trained again for a few epochs.

I think the reason your conversion code is failing is due to

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

Try removing it, and I think conversion should work. If it doesn’t, please let me know. Basically, the QAT conversion by default uses Float inputs/outputs based on the model signature. There is work in progress in TFLiteConverterV2 to support a different model interface int8/uint8 etc.

See this.

Hope this helps.

nutsiepully on Jun 23, 2020

Hi @kalaluthien,

Thanks for the well thought out and detailed bug report. Sorry for the delay in getting back - there’s generally limited time I take out each week to look at github issues 😃

AveragePooling2D
MaxPooling2D

I tried out both these example and they converted+ran just fine for me. Perhaps, there is a flag you are missing during conversion. Check this file for conversion code.

Looking at your colab code, converter._experimental_new_quantizer = True is missing. Please try that and let me know how it goes.

TensorflowOpLayer

This failure is expected. By default, our goal is to support built-in keras layers which is basically layers under the tf.keras.layers module. TensorflowOpLayer can be used to wrap any TF op, and it’s not feasible to meaningfully cover any tf op.

The recommended approach here is to use built-in Keras layers to achieve this. So you can use tf.keras.layers.Add and tf.keras.layers.Reshape instead of using + and expand_dims. That should solve it.

If you really do want to use something else, it’s the user’s responsibility to provide an appropriate QuantizeConfig for your use.

HardSwish

This is again the same problem as TensorFlowOpLayer. And yes, you are right the existing pattern only matches Conv+BN+ReLU. The code likely became slow since it had added a bunch of Quant/Dequant ops in between. I don’t think the converter is likely to match Conv/BN/(Add+Mul ops matching hardswish) either while folding.

The proper fix here would be to add support for hardswish. Can you please file a separate bug for that requesting HardSwish support. I’ll take some time out to add it. We covered MobileNet v1/v2, so this is currently missing.

But we should be able to add support for this. We also need to ensure the converter is handling it properly.

GlobalAveragePooling-Dense

Again, this works for me. That’s how we got MobileNetV2 working and created the results. Perhaps, this is the same issue as Averge/MaxPooling. Please try the fix out and let me know if that works.

nutsiepully on Jun 23, 2020

Seems like this bug is solved. I’m closing it, please feel free to reopen.

You can start a new issue for the hardswish and we can continue our conversation there. Even if it’s done in your code, it can remain an example for others to follow.

And it’ll be pretty easy to incorporate into the library once you’ve implemented it. We can try and get HardSwish moved into Keras.

Thanks @kalaluthien for your patience and proactive use of the library.

nutsiepully on Jun 24, 2020

As for HardSwish, I just looked into it a bit. There seem to be a few tricky pieces.

For starters, hard_swish has not been added as an activation in Keras yet. The goal of the tfmot library is to provide default behavior for all built-in Keras layers/activations. But since hard_swish is not a built-in activation yet, we can’t really add a pattern matching it in the library code. It would need to be handled by the user.

I would recommend adding support for it in your code to begin with. Once hard_swish gets added, we can move this code internally. You should be able to file a bug on keras/tf to check whether they plan to add support for it.

You can create a class HardSwish(Layer) which gets added after Conv + BN. You should be able to use built-in Add and Multiply to do so.

Next, to understand exactly what support needs to be added, we would need to understand how it executes in TFLite.

I created a simple model.

inp = tf.keras.Input(shape=(28, 28, 1))
x = tf.keras.layers.Conv2D(32, 5)(inp)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layer.Relu(6.0)(x + 3.0) * 1.67
m = tf.keras.Model(inp, x)

m.save('hswish.h5')

Converted it using the following code.

conv = tf.lite.TFLiteConverter.from_keras_model(m)
def representative_dataset_gen():
  for _ in range(num_calibration_steps):
     yield np.random.rand(28, 28, 1)

conv.representative_dataset = representative_dataset_gen
conv.convert()

# saved as hswish.tflite

Screen Shot 2020-06-23 at 1 36 45 PM

As you can see the converter fuses the Add into the bias, but the Mul comes after.

So you’ll need to place the FakeQuant after the Add + ReLU but before the Mul. And likely use a transform similar to this. That should sort the issue out.

nutsiepully on Jun 23, 2020

Oh I’m sorry I made a mistake. I meant use

converter.experimental_new_converter = True

That’s what was missing.

nutsiepully on Jun 23, 2020