TinyNeuralNetwork: Converting LiteHRNet pytorch model to TFLite, outputs don't match

Hi, this is really great work, thanks!

I am able to convert the LiteHRNet model to TFLite without running into any issues. However, the outputs don’t match up.

Here is the output from sending ones through the network. Output is of shape [1,17,96,72]. I am just showing here output[0,0,0] from both pytorch and tflite:

pytorch


array([6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 1.8188522e-04, 1.7515068e-04, 1.9644469e-04,
       1.6027213e-04, 1.9049855e-04, 1.5419864e-04, 1.2460010e-04,
       9.0751186e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05],
      dtype=float32)

tflite


array([6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       1.1580180e-04, 2.3429818e-04, 3.9018277e-04, 7.7823577e-03,
       1.8948119e-02, 2.8559987e-02, 3.3612434e-02, 2.5932681e-02,
       1.2074142e-02, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05,
       6.4367385e-05, 6.4367385e-05, 6.4367385e-05, 6.4367385e-05],
      dtype=float32)

When I convert to tflite via the onnx route, the outputs do match. So my guess is that some of the transpose/reshapes for NHWC is not happening correctly but I am not sure. Looking for some insight into what would be the best way to debug this?

Models: LiteHRNet pt trace LiteHRNet tiny tflite

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15

Most upvoted comments

@peterjc123 very cool. Would like to understand how are you generating it. Which tracer are you referring to for generating the model description script?

I’ve implemented the layerwise comparision but has to organize a little bit. Hopefully these stuff will be uploaded soon.

peterjc123 on Dec 17, 2021

@simbara Should be fixed by https://github.com/alibaba/TinyNeuralNetwork/commit/8cabdbe0f6b815482d91b242e248ce01df2f6225 and https://github.com/alibaba/TinyNeuralNetwork/commit/2bde8e48c9ee5e6805cd3980d8d7d1a608e61d28. Would you please try again?

peterjc123 on Dec 15, 2021

@peterjc123 I calculated the difference very similarly to how you did it. But I was using custom model (not the model I uploaded) which is why there is some difference in the numbers.

Again, this is really great work. ONNX doesn’t support a range of layers and other issues. It’s great to see a library like this that converts between the two directly. Congrats!

If you can share how you did the layer-wise comparison I would be very interested in taking a look.

simbara on Dec 17, 2021

I’ve collected the layer-wise difference and you may see them here. Looks like the major difference is brought by the minor errors accumulated from the batch normalization layers.

peterjc123 on Dec 16, 2021

I used the following script for comparing difference.

import json

import numpy as np
import tensorflow as tf
import torch


def get_tensor_details(config):
    input_shapes = []
    input_transpose = []
    input_types = []

    input_shapes.extend((inp['shape'] for inp in config['inputs']))
    input_transpose.extend((inp['transpose'] for inp in config['inputs']))
    input_types.extend((inp['type'] for inp in config['inputs']))

    return input_shapes, input_transpose, input_types


def get_tflite_out(model_path, inputs):
    interpreter = tf.lite.Interpreter(model_path=model_path)
    interpreter.allocate_tensors()

    # Get input and output tensors.
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()

    for i in range(len(inputs)):
        interpreter.set_tensor(input_details[i]['index'], inputs[i])

    interpreter.invoke()

    outputs = []
    for i in range(len(output_details)):
        output_data = interpreter.get_tensor(output_details[i]['index'])
        outputs.append(output_data)

    return outputs


def get_pytorch_out(model_path, inputs):
    model = torch.jit.load(model_path)
    model.eval()

    raw_outputs = []
    with torch.no_grad():
        output = model(*inputs)
        if type(output) in (list, tuple):
            for t in output:
                if type(t) in (list, tuple):
                    raw_outputs.extend(t)
                else:
                    raw_outputs.append(t)
        else:
            raw_outputs.append(output)

    outputs = []
    for output in raw_outputs:
        outputs.append(output.numpy())

    return outputs


def prepare_inputs(input_shapes, input_transpose, input_types):
    inputs = []
    for i in range(len(input_shapes)):
        input_shape = input_shapes[i]
        tranpose = input_transpose[i]
        input_data = np.ones(input_shape, dtype=input_types[i])
        if tranpose:
            input_data = input_data.transpose((0, 2, 3, 1))
        inputs.append(input_data)

    return inputs


def data_to_pytorch(inputs, input_transpose):
    torch_inputs = list(map(torch.from_numpy, inputs))
    for i in range(len(torch_inputs)):
        if input_transpose[i]:
            torch_inputs[i] = torch_inputs[i].permute(0, 3, 1, 2)
    return torch_inputs


def print_diff(outputs, ref_outputs):
    for i in range(len(ref_outputs)):
        ref_out = ref_outputs[i]
        output_data = outputs[i]
        
        assert ref_out.shape == output_data.shape, (f'TFLite and PyTorch output tensor {i} size mismatch:'
                                                    f'{ref_out.shape} vs {output_data.shape}')

        print(f'Output {i} shape: {output_data.shape}')
        print(f'Output {i} matches:', np.allclose(ref_out, output_data, rtol=0.001, atol=1e-05))

        diff = np.abs(ref_out - output_data)
        diff_mean = np.mean(diff)
        diff_min = np.min(diff)
        diff_max = np.max(diff)
        print(f'Output {i} absolute difference min,mean,max: {diff_min},{diff_mean},{diff_max}')


def compare_tflite_pytorch_results(json_file):
    with open(json_file, 'r') as f:
        config = json.load(f)

    tflite_model_path = config['dst_model']

    input_shapes, input_transpose, input_types = get_tensor_details(config)
    inputs = prepare_inputs(input_shapes, input_transpose, input_types)
    outputs = get_tflite_out(tflite_model_path, inputs)

    for i, (s, t) in enumerate(zip(input_shapes, input_transpose)):
        print(f'Input {i} shape: {input_shapes[i]}')
        print(f'Input {i} transpose: {input_transpose[i]}')

    torch_model_path = config['src_model']
    torch_inputs = data_to_pytorch(inputs, input_transpose)
    ref_outputs = get_pytorch_out(torch_model_path, torch_inputs)

    assert len(outputs) == len(
        ref_outputs), (f'TFLite and PyTorch output tensor count mismatch: '
                       f'{len(outputs)} vs {len(ref_outputs)}')

    print_diff(outputs, ref_outputs)


if __name__ == '__main__':
    json_file = 'litehrnet.json'
    compare_tflite_pytorch_results(json_file)

And it involves a configuration file in the json format. I used that so that I could use convert_from_json.py to generate the model.

{
    "src_model": "litehrnet_trace.pt",
    "dst_model": "litehrnet_tiny.tflite",
    "inputs": [
        {
            "shape": [
                1,
                3,
                384,
                288
            ],
            "type": "float32",
            "transpose": true
        }
    ],
    "outputs": [
        {
            "type": "float32",
            "transpose": false
        }
    ]
}

Running the script yields

Input 0 shape: [1, 3, 384, 288]
Input 0 transpose: True
Output 0 shape: (1, 17, 96, 72)
Output 0 matches: True
Output 0 absolute difference min,mean,max: 0.0,4.584063617585343e-08,2.5890767574310303e-06

peterjc123 on Dec 16, 2021