openvino: [Bug] Decoding Semantic Segmentation output blob

System information (version)

OpenVINO => 2021.3.394
Operating System / Platform => Ubuntu 18.04 64Bit
Compiler => g++ 7.5.0
Problem classification: Inference, model optimization
Framework: Pytorch -> ONNX -> IR
Model name: BiseNet

Detailed description

I have been working on a semantic segmentation model for a custom industrial application. The results from the Openvino framework is very different from the results i get from Pytorch. Please refer to the images, the first image is the segmentation result of Pytorch and the second is the segmentation results from OpenVino C++ API:

1_trt open_1

Please refer to the usage of Model Optimizer as shown below:

 python3 /opt/intel/openvino_2021/deployment_tools/model_optimizer/mo.py -m ~/openvino_models/onnx/bisenet_v1.onnx\
 -o ~/openvino_models/ir --input_shape [1,3,480,640] --data_type FP16 --output preds\
 --input input_image --mean_values [123.675,116.28,103.52] --scale_values [58.395,57.12,57.375]

I perform a normalization of image during training and hence I have used the same parameters(mean and variance/scale). I did a quick analysis of the onnx model and IR model from Openvino. I found one striking difference which is a bit weird. The ONNX model has only one output as in the original model while the IR format has 2 outputs.

My Inference Engine integration code:

    InferenceEngine::Core core;
    InferenceEngine::CNNNetwork network;
    InferenceEngine::ExecutableNetwork executable_network;
    network = core.ReadNetwork(input_model); /// input_model is a string path to the .xml file

Input Settings: Since my network work on RGB colour format, I perform a conversion from BGR -> RGB

    InferenceEngine::InputInfo::Ptr input_info = network.getInputsInfo().begin()->second;
    std::string input_name = network.getInputsInfo().begin()->first;

    input_info->setPrecision(InferenceEngine::Precision::U8);
    input_info->setLayout(InferenceEngine::Layout::NCHW);
    input_info->getPreProcess().setColorFormat(InferenceEngine::ColorFormat::RGB);

Output Settings: I use the rbegin() function instead of begin() to access the second output of the network as it is the desired output and the first output is just created during the model optimization step which I don’t understand 😦. The model has a 64 bit integer output but I set it to 32-bit int. The ouput layout is CHW with C always one and the values of the H x W represent the corresponding class of the that pixel.

    InferenceEngine::DataPtr output_info = network.getOutputsInfo().rbegin()->second; 
    std::string output_name = network.getOutputsInfo().rbegin()->first; 
    output_info->setPrecision(InferenceEngine::Precision::I32); ///The model has a 64 bit integer output but I set it to 32-bit int 
    output_info->setLayout(InferenceEngine::Layout::CHW);

Creation of Infer-request and input blob and inference

    InferenceEngine::InferRequest infer_request = executable_network.CreateInferRequest();
    cv::Mat image = cv::imread(input_image_path);

    InferenceEngine::TensorDesc tDesc(
            InferenceEngine::Precision::U8, input_info->getTensorDesc().getDims(), input_info->getTensorDesc().getLayout()
        );

    InferenceEngine::Blob::Ptr imgBlob = InferenceEngine::make_shared_blob<unsigned char>(tDesc, image.data);
    infer_request.SetBlob(input_name, imgBlob);
    infer_request.Infer();

Random Color palette for visualization of the result

std::vector<std::vector<uint8_t>> get_color_map()
{
    std::vector<std::vector<uint8_t>> color_map(256, std::vector<uint8_t>(3));
    std::minstd_rand rand_engg(123);
    std::uniform_int_distribution<uint8_t> u(0, 255);
    for (int i{0}; i < 256; ++i) {
        for (int j{0}; j < 3; j++) {
            color_map[i][j] = u(rand_engg);
        }
    }
    return color_map;
}

Decoding of the output blob

    InferenceEngine::Blob::Ptr output = infer_request.GetBlob(output_name);
    auto const memLocker = output->cbuffer();
    const auto *res = memLocker.as<const int *>();
    auto oH = output_info->getTensorDesc().getDims()[1];
    auto oW = output_info->getTensorDesc().getDims()[2];
    cv::Mat pred(cv::Size(oW, oH), CV_8UC3);
    std::vector<std::vector<uint8_t>> color_map = get_color_map();
     int idx{0};
    for (int i = 0; i < oH; ++i)
    {
        auto *ptr = pred.ptr<uint8_t>(i);
        for (int j = 0; j < oW; ++j)
        {
            ptr[0] = color_map[res[idx]][0];
            ptr[1] = color_map[res[idx]][1];
            ptr[2] = color_map[res[idx]][2];
            ptr += 3;
            ++idx;
        }
     }
     cv::imwrite(save_pth, pred);

Could you please tell me if I am doing something wrong? Please feel free to ask for more details.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (4 by maintainers)

Links to this issue

Most upvoted comments

@vladimir-dudnik I have tested it in my application and it works fine. Thank you for your support. I am closing the issue. I’m also posting a very simple segmentation example in-case if anyone needs a very simple example.

#include <iostream>
#include "inference_engine.hpp"
#include "opencv2/opencv.hpp"
#include <random>
#include "ie_blob.h"
#include "ocv_common.h"

/// Random color palette ///
std::vector<std::vector<uint8_t>> get_color_map()
{
    std::vector<std::vector<uint8_t>> color_map(256, std::vector<uint8_t>(3));
    std::minstd_rand rand_engg(123);
    std::uniform_int_distribution<uint8_t> u(0, 255);
    for (int i{0}; i < 256; ++i) {
        for (int j{0}; j < 3; j++) {
            color_map[i][j] = u(rand_engg);
        }
    }
    return color_map;
}

int main(int argc, char* argv[] ) {
    try {
        if (argc != 5){
            std::cout << "Usage :" << argv[0] << "<path_to_model> <path_to_image> <device_name> <path_to_store_result>" << std::endl;
        }

        const std::string input_model {argv[1]};
        const std::string input_image_path {argv[2]};
        const std::string device_name = {argv[3]};
        const std::string save_pth = {argv[4]};


    /// Inference Engine setup ///
    InferenceEngine::Core core;
    InferenceEngine::CNNNetwork network;
    InferenceEngine::ExecutableNetwork executable_network;
    network = core.ReadNetwork(input_model);

    if (network.getOutputsInfo().size() != 1)
        throw std::logic_error("Inference Engine supports only single frame inference output");

    if (network.getInputsInfo().size() != 1)
        throw std::logic_error("Inference Engine supports only single input");

    InferenceEngine::InputInfo::Ptr input_info = network.getInputsInfo().begin()->second;
    std::string input_name = network.getInputsInfo().begin()->first;

    input_info->setPrecision(InferenceEngine::Precision::U8);
    input_info->setLayout(InferenceEngine::Layout::NCHW);
    input_info->getPreProcess().setColorFormat(InferenceEngine::ColorFormat::RGB);

    if (network.getOutputsInfo().empty()){
        std::cerr << "Network outputs info is empty" << std::endl;
        return EXIT_FAILURE;
    }

    InferenceEngine::DataPtr output_info = network.getOutputsInfo().begin()->second;
    std::string output_name = network.getOutputsInfo().begin()->first;
    output_info->setPrecision(InferenceEngine::Precision::I32);
    output_info->setLayout(InferenceEngine::Layout::CHW);


    ///Load Network and Synchronous Infer Request Creation///
    executable_network = core.LoadNetwork(network, device_name);
    InferenceEngine::InferRequest infer_request = executable_network.CreateInferRequest();

    ///Read image and convert to Blob ///
    cv::Mat image = cv::imread(input_image_path);
    InferenceEngine::Blob::Ptr imgBlob = wrapMat2Blob(image);
    imgBlob->allocate();

    ///Inference ///
    infer_request.SetBlob(input_name, imgBlob);
    infer_request.Infer();
    imgBlob->deallocate();
    InferenceEngine::Blob::Ptr output = infer_request.GetBlob(output_name);

    ///Decoding and applying color map///
    auto const memLocker = output->cbuffer();
    const auto *res = memLocker.as<const int *>();
    auto oH = output_info->getTensorDesc().getDims()[1];
    auto oW = output_info->getTensorDesc().getDims()[2];

    cv::Mat pred(cv::Size(oW, oH), CV_8UC3);
    std::vector<std::vector<uint8_t>> color_map = get_color_map();

    int idx{0};
    for (int i = 0; i < oH; ++i)
    {
        auto *ptr = pred.ptr<uint8_t>(i);

        for (int j = 0; j < oW; ++j)
        {
            ptr[0] = color_map[res[idx]][0];
            ptr[1] = color_map[res[idx]][1];
            ptr[2] = color_map[res[idx]][2];
            ptr += 3;
            ++idx;

        }

    }

    /// Saving the output ///
    cv::imwrite(save_pth, pred);
    output->deallocate();

    } catch (const std::exception& ex){
        std::cerr << ex.what() << std::endl;
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

Eashwar93 on Jul 7, 2021

@vladimir-dudnik I think this looks promising, 2 represents class ‘person’ but with specific uniforms and 1 represents our object of interest which is not very common and 0 is the background class. Technically its a 3 class problem, sorry that I mentioned it as a 2 class problem earlier. Thanks once again.

Eashwar93 on Jul 6, 2021

@Eashwar93 sure, that is OMZ Python segmentation demo output for some random “industrial” picture (we do understand that this is not a target scene, but at least model detected class 0 and class 2 on it)

vladimir-dudnik on Jul 6, 2021

@Eashwar93 thanks, there was an issue with MO in OpenVINO 2021.3 which cause generating IR with two outputs. This was fixed in the latest OpenVINO 2021.4 release, which you can try. On our side, we tried to convert your ONNX model to IR with 2021.4 release and IR contain single output, which allow to run inference with OMZ demo (although, I do not know what object classes it was trained for, it seems for generic street scene it does not detect anything meaningful). By the way, as you probably know, OpenVINO Inference Engine also support ONNX directly (again, starting from 2021.4 release OMZ segmentation demo is able to accept ONNX file instead of XML, although this support in demo should be improved. We provided such support in object_detection_demo, so when you use ONNX instead of XML, it also allow you to specify mean/scale values, required for data preprocessing. This step not implemented yet in segmentation demo, but it should be easy to add)

vladimir-dudnik on Jul 6, 2021