TensorRT: [REFERENCE] KeyError: 'mrcnn_mask_bn4/batchnorm/mul_1' in running sampleUffMaskRCNN demo

while I try to run the maskrcnn demo following this page

Ubuntu 16.04.6 CUDA 10.1.168 tensorrt 5.1.5.0 uff 0.6.3

Traceback (most recent call last):
  File "mrcnn_to_trt_single.py", line 165, in <module>
    main()
  File "mrcnn_to_trt_single.py", line 123, in main
    text=True, list_nodes=list_nodes)
  File "mrcnn_to_trt_single.py", line 158, in convert_model
    debug_mode = False
  File "/usr/lib/python3.5/dist-packages/uff/converters/tensorflow/conversion_helpers.py", line 233, in from_tensorflow_frozen_model
    return from_tensorflow(graphdef, output_nodes, preprocessor, **kwargs)
  File "/usr/lib/python3.5/dist-packages/uff/converters/tensorflow/conversion_helpers.py", line 108, in from_tensorflow
    pre.preprocess(dynamic_graph)
  File "./config.py", line 123, in preprocess
    connect(dynamic_graph, timedistributed_connect_pairs)
  File "./config.py", line 113, in connect
    if node_a_name not in dynamic_graph.node_map[node_b_name].input:
KeyError: 'mrcnn_mask_bn4/batchnorm/mul_1'

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 36

Most upvoted comments

Okay @doomb007 I was able to run the sample.

Possible solutions when using CUDA 10.1:

Solution 1

Use nvcr.io/nvidia/tensorflow:19.10-py3 - this has TensorFlow 1.14 built for CUDA 10.1, unlike the current pip packages that aren’t working as mentioned in #132.

nvidia-docker run -it -v ${PWD}:/mnt --workdir=/mnt nvcr.io/nvidia/tensorflow:19.10-py3

Then inside the container:

# Download OSS Components
git clone -b master https://github.com/nvidia/TensorRT TensorRT
cd TensorRT
git submodule update --init --recursive
export TRT_SOURCE=`pwd`

# Install required libraries
apt-get update && apt-get install -y --no-install-recommends libcurl4-openssl-dev wget zlib1g-dev git pkg-config

# Install CMake >= 3.13
pushd /tmp
wget https://github.com/Kitware/CMake/releases/download/v3.14.4/cmake-3.14.4-Linux-x86_64.sh
chmod +x cmake-3.14.4-Linux-x86_64.sh
./cmake-3.14.4-Linux-x86_64.sh --prefix=/usr/local --exclude-subdir --skip-license
rm ./cmake-3.14.4-Linux-x86_64.sh
popd

# Necessary in the Tensorflow container due to some PATH/cmake issues
export CMAKE_ROOT=/usr/share/cmake-3.14
source ~/.bashrc

# Check that installation worked
cmake --version

# Set relevant env variables relative to NGC container paths
export TRT_RELEASE=/usr/src/tensorrt
export TRT_LIB_DIR=$TRT_RELEASE/lib
export TRT_SOURCE=/mnt/TensorRT
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TRT_LIB_DIR

# Generate Makefiles and build
cd $TRT_SOURCE
mkdir -p build && cd build 
cmake .. -DTRT_LIB_DIR=$TRT_RELEASE/lib -DTRT_BIN_DIR=`pwd`/out
make -j$(nproc)

# Installs OSS Components and Builds All Samples
make install

sampleUffMaskRCNN specific

# This container comes with tensorflow-gpu==1.14.0+nv built for CUDA 10.1, note the version has "+nv" at the end of it
# This currently won't work with Google's `pip install tensorflow-gpu==1.14` + CUDA 10.1
pip install -r $TRT_SOURCE/samples/opensource/sampleUffMaskRCNN/converted/requirements.txt

# Verify location of uff package to edit
find / -name uff 
# Edit the conv_transpose function as mentioned in README
vim /usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter_functions.py 

# Set config if not already set so we don't get an error from "git am ..." below
git config --global user.name "foo"
git config --global user.email "bar"

# Clone Mask_RCNN repo, and add it to python path for imports
cd $TRT_SOURCE
git clone https://github.com/matterport/Mask_RCNN.git
export PYTHONPATH=$PYTHONPATH:$PWD/Mask_RCNN

# Checkout specific Mask_RCNN version, and apply patch
cd Mask_RCNN
git checkout 3deaec5
git am $TRT_SOURCE/samples/opensource/sampleUffMaskRCNN/converted/0001-Update-the-Mask_RCNN-model-from-NHWC-to-NCHW.patch

# Setup data dir
DATA_DIR=$TRT_SOURCE/samples/opensource/sampleUffMaskRCNN/data
mkdir -p $DATA_DIR
wget -P $DATA_DIR https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

# In TensorRT container, you can copy the data from $TRT_RELEASE/data/faster-rcnn instead
# In TensorFlow container, you'll need to mount data from the host like so.
# See workaround expandable section below for how to get data on host
cp /mnt/001763.ppm $DATA_DIR/
cp /mnt/004545.ppm $DATA_DIR/

# Convert the downloaded model to UFF
cd $TRT_SOURCE/samples/opensource/sampleUffMaskRCNN/converted/
python mrcnn_to_trt_single.py -w $DATA_DIR/mask_rcnn_coco.h5 -o $DATA_DIR/mrcnn_nchw.uff -p config.py

# Run the sample
# NOTE: All samples were already built above when doing cmake + make install
cd $TRT_RELEASE/bin
./sample_uff_maskRCNN -d $DATA_DIR
Click to expand Copying the *.ppm data to host
# Launch TensorRT container to copy data over to host
nvidia-docker run -it -v ${PWD}:/mnt --workdir=/mnt nvcr.io/nvidia/tensorrt:19.10-py3
cp /usr/src/tensorrt/data/faster-rcnn/001763.ppm /mnt
cp /usr/src/tensorrt/data/faster-rcnn/004545.ppm /mnt
# Exit container, data should now be in your current directory that was mounted
exit

# Launch TensorFlow container mounting the same directory
nvidia-docker run -it -v ${PWD}:/mnt --workdir=/mnt nvcr.io/nvidia/tensorflow:19.10-py3

# Data should now be in /mnt in TensorFlow container
Click to expand successful UFF Parsing output
UFF Version 0.6.5
=== Automatically deduced input nodes ===
[name: "input_image"
op: "Placeholder"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: -1
      }
      dim {
        size: 3
      }
      dim {
        size: 1024
      }
      dim {
        size: 1024
      }
    }
  }
}
]
=========================================

Using output node mrcnn_detection
Using output node mrcnn_mask/Sigmoid
Converting to UFF graph
Warning: No conversion function registered for layer: PyramidROIAlign_TRT yet.
Converting roi_align_mask_trt as custom op: PyramidROIAlign_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting fpn_p5upsampled as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting fpn_p4upsampled as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting fpn_p3upsampled as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: SpecialSlice_TRT yet.
Converting mrcnn_detection_bboxes as custom op: SpecialSlice_TRT
Warning: No conversion function registered for layer: DetectionLayer_TRT yet.
Converting mrcnn_detection as custom op: DetectionLayer_TRT
Warning: No conversion function registered for layer: ProposalLayer_TRT yet.
Converting ROI as custom op: ProposalLayer_TRT
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: keepdims is ignored by the UFF Parser and defaults to True
Warning: No conversion function registered for layer: PyramidROIAlign_TRT yet.
Converting roi_align_classifier as custom op: PyramidROIAlign_TRT
DEBUG [/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['mrcnn_detection', 'mrcnn_mask/Sigmoid'] as outputs
No. nodes: 3044
UFF Output written to mrcnn_nchw.uff
UFF Text Output written to mrcnn_nchw.pbtxt
Click to expand successful sample run output

Running the sample

root@eb12143e7d72:/mnt/TensorRT/samples/opensource/sampleUffMaskRCNN/converted# sample_uff_maskRCNN -d $DATA_DIR
&&&& RUNNING TensorRT.sample_maskrcnn # sample_uff_maskRCNN -d /mnt/TensorRT/samples/opensource/sampleUffMaskRCNN/data/
[10/08/2019-18:59:48] [I] Building and running a GPU inference engine for Mask RCNN
[10/08/2019-18:59:58] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[10/08/2019-19:02:35] [I] [TRT] Detected 1 inputs and 2 output network tensors.
[10/08/2019-19:02:40] [I] Run for 10 times with Batch Size 1
[10/08/2019-19:02:40] [I] Average inference time is 441.708 ms/frame
[10/08/2019-19:02:40] [I] Detected dog in/mnt/TensorRT/samples/opensource/sampleUffMaskRCNN/data/001763.ppm with confidence 99.9171 and coordinates (259.165, 13.8516, 488.325, 370.222)
[10/08/2019-19:02:40] [I] Detected dog in/mnt/TensorRT/samples/opensource/sampleUffMaskRCNN/data/001763.ppm with confidence 99.8545 and coordinates (27.6855, 45.785, 317.039, 365.296)
[10/08/2019-19:02:40] [I] The results are stored in current directory: 0.ppm
&&&& PASSED TensorRT.sample_maskrcnn # sample_uff_maskRCNN -d /mnt/TensorRT/samples/opensource/sampleUffMaskRCNN/data/

Solution 2

Build tensorflow from source for CUDA 10.1: https://github.com/tensorflow/tensorflow/issues/26150#issuecomment-506807444

(I didn’t test this)

Solution 3

If using CUDA 10.1, I think downgrading to CUDA 10.0 first and then using the pip package: pip install tensorflow-gpu==1.14 should work, until Google releases pip package binaries for CUDA 10.1

See note in README: https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleUffMaskRCNN#known-issues

(I didn’t test this)

Maybe the keras version is not 2.1.3. I download the latest keras 2.3.x and have the same error as you. Now I change the keras version to 2.1.3, everything is ok

@rmccorm4 Sorry for my late reply, just finished my weekend, I’m testing the MaskRCNN sample now, will tell you if there is any progress.

Hi @doomb007,

All of the samples are built and placed into $TRT_RELEASE/bin from the start when you do:

# Generate Makefiles and build
cd $TRT_SOURCE
mkdir -p build && cd build 
cmake .. -DTRT_LIB_DIR=$TRT_RELEASE/lib -DTRT_BIN_DIR=`pwd`/out
make -j$(nproc)

# Installs OSS Components and Builds All Samples
make install

I’ll see if I can fix the README on that sample, because that part that you referenced seems misleading.

I’ve updated my comment above (https://github.com/NVIDIA/TensorRT/issues/123#issuecomment-551269792) to do the entire pipeline:

  1. Clone repo
  2. Build OSS components
  3. Convert model to UFF successfully
  4. Run sample successfully

Note, that we’re using that Tensorflow container only because of the CUDA 10.1 restriction. Using CUDA 10.0 shouldn’t be as cumbersome. And once there is a pip package for Tensorflow 1.14 compatible with CUDA 10.1, it should also be easier.

But since we’re using the Tensorflow container above, it doesn’t come with $TRT_RELEASE/data/faster-rcnn that’s mentioned in the sample. So I also added an expandable section on how to grab that data from the TensorRT container and copy it over.

Click to expand Copying the *.ppm data to host
# Launch TensorRT container to copy data over to host
nvidia-docker run -it -v ${PWD}:/mnt --workdir=/mnt nvcr.io/nvidia/tensorrt:19.10-py3
cp /usr/src/tensorrt/data/faster-rcnn/001763.ppm /mnt
cp /usr/src/tensorrt/data/faster-rcnn/004545.ppm /mnt
# Exit container, data should now be in your current directory that was mounted
exit

# Launch TensorFlow container mounting the same directory
nvidia-docker run -it -v ${PWD}:/mnt --workdir=/mnt nvcr.io/nvidia/tensorflow:19.10-py3

# Data should now be in /mnt in TensorFlow container

@rmccorm4 yes, It works, The format of the MaskRCNN model was converted to uff.

A tip: After installing the TensorRT resource following the tutorial, don’t arbitrarily move the internal folders and files out of the main folder, it may also cause the error as this:

Traceback (most recent call last):
  File "mrcnn_to_trt_single.py", line 182, in <module>
    main()
  File "mrcnn_to_trt_single.py", line 131, in main
    model = modellib.MaskRCNN(mode="inference", model_dir=LOG_DIR, config=config).keras_model
  File "/home/martin/Downloads/TensorRT-6.0.1.5/samples/sampleUffMaskRCNN/converted/mrcnn/model.py", line 1837, in __init__
    self.keras_model = self.build(mode=mode, config=config)
  File "/home/martin/Downloads/TensorRT-6.0.1.5/samples/sampleUffMaskRCNN/converted/mrcnn/model.py", line 1901, in build
    stage5=True, train_bn=config.TRAIN_BN)
  File "/home/martin/Downloads/TensorRT-6.0.1.5/samples/sampleUffMaskRCNN/converted/mrcnn/model.py", line 180, in resnet_graph
    x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
  File "/home/martin/anaconda3/lib/python3.7/site-packages/keras/engine/topology.py", line 590, in __call__
    self.build(input_shapes[0])
  File "/home/martin/anaconda3/lib/python3.7/site-packages/keras/layers/convolutional.py", line 129, in build
    raise ValueError('The channel dimension of the inputs'
ValueError: The channel dimension of the inputsshould be defined. Found `None`.

Hi @doomb007, @guods

I was able to reproduce your error with the following:

pip install -r $TRT_SOURCE/samples/opensource/sampleUffMaskRCNN/converted/requirements.txt
pip install tensorflow-gpu == 1.14

dpkg -i /opt/tensorrt/python/graphsurgeon-tf_6.0.1-1+cuda10.1_amd64.deb 
dpkg -i /opt/tensorrt/python/uff-converter-tf_6.0.1-1+cuda10.1_amd64.deb 

git clone https://github.com/matterport/Mask_RCNN.git
export PYTHONPATH=$PYTHONPATH:$PWD/Mask_RCNN

cd Mask_RCNN/
wget https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5
cd $TRT_SOURCE/samples/opensource/sampleUffMaskRCNN/converted/

python mrcnn_to_trt_single.py -w $TRT_SOURCE/Mask_RCNN/mask_rcnn_coco.h5 -o mrcnn_nchw.uff -p config.py 

Same error:

Traceback (most recent call last):
  File "mrcnn_to_trt_single.py", line 164, in <module>
    main()
  File "mrcnn_to_trt_single.py", line 113, in main
    model = modellib.MaskRCNN(mode="inference", model_dir=LOG_DIR, config=config).keras_model
  File "/mnt/TensorRT/Mask_RCNN/mrcnn/model.py", line 1837, in __init__
    self.keras_model = self.build(mode=mode, config=config)
  File "/mnt/TensorRT/Mask_RCNN/mrcnn/model.py", line 1901, in build
    stage5=True, train_bn=config.TRAIN_BN)
  File "/mnt/TensorRT/Mask_RCNN/mrcnn/model.py", line 180, in resnet_graph
    x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/topology.py", line 590, in __call__
    self.build(input_shapes[0])
  File "/usr/local/lib/python3.6/dist-packages/keras/layers/convolutional.py", line 129, in build
    raise ValueError('The channel dimension of the inputs '
ValueError: The channel dimension of the inputs should be defined. Found `None`.

However, this happens when you don’t apply the patch to MRCNN as mentioned in the instructions:

git clone https://github.com/matterport/Mask_RCNN.git
export PYTHONPATH=$PYTHONPATH:$PWD/Mask_RCNN

# Make sure you do this
cd Mask_RCNN
git checkout 3deaec5
git am $TRT_SOURCE/samples/opensource/sampleUffMaskRCNN/converted/0001-Update-the-Mask_RCNN-model-from-NHWC-to-NCHW.patch

In case the above fails and gives you an error from git about your config like so:

root@e50cb87c5094:/mnt/TensorRT/Mask_RCNN# git checkout 3deaec5
Note: checking out '3deaec5'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 3deaec5 import logging for line 382
root@e50cb87c5094:/mnt/TensorRT/Mask_RCNN# git am $TRT_SOURCE/samples/opensource/sampleUffMaskRCNN/converted/0001-Update-the-Mask_RCNN-model-from-NHWC-to-NCHW.patch

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

You can set your actual config or some dummy config, it doesn’t matter:

git config --global user.email "foo"
git config --global user.name "bar"

git checkout 3deaec5
git am $TRT_SOURCE/samples/opensource/sampleUffMaskRCNN/converted/0001-Update-the-Mask_RCNN-model-from-NHWC-to-NCHW.patch
# Applying: Update the Mask_RCNN model from NHWC to NCHW

However, now I’m hitting the same error as mentioned here: https://github.com/NVIDIA/TensorRT/issues/132#issuecomment-549188744

I think it’s because I have CUDA 10.1 on my host and there is still some incompatibility with TensorFlow at the moment. Looking into it now.