model-optimization: Quantization: QuantizeModelsTest.testModelEndToEnd() function does not check the correctness of quantization process

Describe the bug

Hi, I’m trying to understand the model optimization API and create quantized version of the MobileNetV2 model for my side project (a small library for training custom detection models for mobile). However, I’m struggling to get any positive results using this library compared to old approach when I was using converter.representative_dataset for model conversion.

I found that there are some end-to-end tests in your repository in the tensorflow_model_optimization.python.core.quantization.keras.quantize_models_test.py file, however it seems these tests do not check whether the conversion actually makes sense or not. I mean the conversion is successful but it is so inaccurate that cannot be used in practice.

If you will check the predictions outputs of converted model you will see it produces only zeros (I get similar behavior when training on real data for much longer time). The picture below shows the min/max and std values of the output of the converted MobileNetV2 model from your end to end test:

When I initialize this model with imagenet weights I still get different values at the outputs between keras quantized and converted to tflite model:

What are your general suggestions for checking for possible sources of invalid conversion? For example, I’m aware that I should finetune my quantized model for a couple of epochs, however I’m not sure how long or if this actually matters. Are batchnorm layers supported (I saw that composition of ConvBatchNormRelu is implemented). In general (or maybe in near future), should I expect to get similar accuracy of quantized Keras model and converted to tflite one ?

Thanks, Krzysztof

System information

TensorFlow installed from (source or binary): binary

TensorFlow version: tf-nightly==2.2.0.dev20200316

TensorFlow Model Optimization version: tf-model-optimization-nightly==0.2.1.dev20200320 (compiled from source code, from master branch)

Python version: 3.7.0

Code to reproduce the issue

This is slightly modified version of the _verify_tflite function from your tests, which I used to check the outputs of the converted models.

def _verify_tflite(tflite_file, x_test, y_test, model):
    interpreter = tf.lite.Interpreter(model_path=tflite_file)
    interpreter.allocate_tensors()
    input_index = interpreter.get_input_details()[0]['index']
    output_index = interpreter.get_output_details()[0]['index']
    
    keras_predictions = model.predict(x_test)
    tflite_predictions = []
    for x, _ in zip(x_test, y_test):
      x = x.reshape((1,) + x.shape)
      interpreter.set_tensor(input_index, x)
      interpreter.invoke()
      outputs = interpreter.get_tensor(output_index)
      tflite_predictions.append(outputs)

    return np.vstack(tflite_predictions), keras_predictions

tflite_predictions, keras_predictions = _verify_tflite(tflite_file, x_train, y_train, base_model)

print(tflite_predictions.min(), tflite_predictions.max(), tflite_predictions.std())
print(keras_predictions.min(), keras_predictions.max(), keras_predictions.std())

model-optimization: Quantization: QuantizeModelsTest.testModelEndToEnd() function does not check the correctness of quantization process

About this issue

Most upvoted comments