TensorRT: Pytorch -> ONNX -> TensorRT incorrect TensorRT results

I have a trained pytorch model, UNET type architecture that I’ve already trained. I can convert to ONNX successfully and inference results, although slightly different from pytorch, are very similar. However, when I convert from ONNX to TensorRT, the results are more different and results in an incorrect segmentation.

Pytorch and onnx were both compiled from source to ensure the same cuda and cudnn libraries were used as packaged in tensorrt. The model was, however, trained a few months ago on another machine that had the pip installed version of pytorch.

Environment

TensorRT Version: TensorRT-7.2.0.14 GPU Type: 2070 Max-Q Nvidia Driver Version: 450.51.06 CUDA Version: 11.0 (update 1) CUDNN Version: 8.0.2 Operating System + Version: ubuntu 18.04 Python Version (if applicable): 3.7.7 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.7.0 Baremetal or Container (which commit + image + tag):

Steps To Reproduce

I can upload a notebook in a bit, but essentially, the following model is loaded:

model

Inference results with pytorch (correct results): pytorch

Model is converted to onnx with:

orch.onnx.export(model,
                  X.cuda(),
                  'model.onnx',
                  export_params=True,
                  opset_version=11, # Needed for upsample operation
                  verbose=False)

Inference results:

onnx

Note that the topleft pixel value is slightly different from pytorch’s: 5.532728672027588 -> 5.532743453979492

But overall the segmentation mask is about the same.

Now, for tensort, the model is converted, serialized, then deserialized:

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
model_path = 'model.onnx'

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(EXPLICIT_BATCH)
parser = trt.OnnxParser(network, TRT_LOGGER)
parser.parse_from_file(model_path)
config = builder.create_builder_config()
config.max_workspace_size = 1 << 20
engine = builder.build_engine(network, config)

engine_path = 'model.engine'
with open(engine_path, 'wb') as f:
    f.write(bytearray(engine.serialize()))

runtime = trt.Runtime(TRT_LOGGER)
with open(engine_path, 'rb') as f:
    engine = runtime.deserialize_cuda_engine(f.read())

h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=np.float32)
h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=np.float32)
d_input = cuda.mem_alloc(h_input.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)

stream = cuda.Stream()
context = engine.create_execution_context()
h_input[:] = np.fromfile('X.raw', dtype=np.float32) # Copy preprocessed data to pagelocked memeory

cuda.memcpy_htod_async(d_input, h_input, stream)    # Transfer input data to the GPU.
context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle) # Run inference.
cuda.memcpy_dtoh_async(h_output, d_output, stream)  # Transfer predictions back from the GPU.
stream.synchronize()                                # Synchronize the stream

h_output = h_output.reshape(engine.get_binding_shape(1))

I checked terminal output and didnt see any error messages. The inference results I got were:

tensorrt

The inference mask is now wrong (added green blob) and also that topleft value in the mask output is:

pytorch  = 5.532728672027588
onnx     = 5.532743453979492
tensortt = 5.454630374908447

So now that value is noticeably different, which is concerning because all precision should still be fp32.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17

Most upvoted comments

Hello @Ekta246 ,

Do you anticipate the 7.2 would do any better not getting the zeros in the arrays.?

We keep improving the onnx parser in TRT, but sorry I cannot tell if 7.2 can fix the zeros you see, because it is not root cause yet.

I cannot understand how do I use the --trt-outputs and --onnx outputs. Are there any example reference samples? I do not understand which files in the

The Polygraphy is a tool to compare the TRT result with other framework result. After install you can also get a commandline tool E.g, To compare the output between onnxruntime and trt, you can run

polygraphy run your.onnx --trt --onnxrt 

To mark all nodes in onnx as output and compare between onnxruntime and trt, you can run

polygraphy run your.onnx --trt --onnxrt --onnx-outputs mark all --trt-outputs mark all

And if you find mark all layer as output is too slow to run, then you can manually edit your onnxfile, you can use onnx-graphsurgeon, or you can use other favorite tools