server: Onnx runtime build error for 23.01

Description

  1. I’m trying to build the Triton container as follows, where RELEASE_TAG=r23.01
./build.py --enable-logging --enable-stats --enable-tracing --enable-metrics --enable-gpu-metrics --enable-gpu --no-container-interactive --endpoint=http --endpoint=grpc --endpoint=sagemaker --repo-tag=common:$RELEASE_TAG --repo-tag=core:$RELEASE_TAG --repo-tag=backend:$RELEASE_TAG --repo-tag=thirdparty:$RELEASE_TAG --backend=ensemble:$RELEASE_TAG --backend=tensorrt:$RELEASE_TAG --backend=identity:$RELEASE_TAG --backend=repeat:$RELEASE_TAG --backend=square:$RELEASE_TAG --backend=onnxruntime:$RELEASE_TAG --backend=pytorch:$RELEASE_TAG --backend=tensorflow1:$RELEASE_TAG --backend=tensorflow2:$RELEASE_TAG --backend=python:$RELEASE_TAG --backend=dali:$RELEASE_TAG --backend=fil:$RELEASE_TAG --backend=fastertransformer:main --repoagent=checksum:$RELEASE_TAG

Error detected:

[ 70%] Building CXX object CMakeFiles/onnxruntime_providers.dir/workspace/onnxruntime/onnxruntime/core/providers/cpu/math/cumsum.cc.o
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits(566): error: "cuda" is ambiguous
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits(566): error: too many arguments for alias template "cuda::std::__4::_BoolConstant"
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits(568): error: expected a ";"
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits(575): error: "cuda" is ambiguous
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits(575): error: too many arguments for alias template "cuda::std::__4::_BoolConstant"
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits(577): error: expected a ";"
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits(1064): error: name followed by "::" must be a class or namespace name
.
.
.
100 errors detected in the compilation of "/workspace/onnxruntime/onnxruntime/contrib_ops/cuda/bert/attention_impl.cu".
Compilation terminated.
make[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/build.make:2990: CMakeFiles/onnxruntime_providers_cuda.dir/workspace/onnxruntime/onnxruntime/contrib_ops/cuda/bert/attention_impl.cu.o] Error 255
make[2]: *** Waiting for unfinished jobs....

Triton Information What version of Triton are you using? -> v23.01

Are you using the Triton container or did you build it yourself? -> Building it myself, running into error

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior Can someone help fix this issue, thank you.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 28 (27 by maintainers)

Most upvoted comments

@tanmayv25 @jbkyang-nvi Thanks for the help with this issue! Some snippet of what worked eventually:

      - git clone -b tanmayv-fix https://github.com/triton-inference-server/fastertransformer_backend.git
      - cd fastertransformer_backend
      - python3 docker/create_dockerfile_and_build.py --triton-version=22.12 --image-name tritonserver_22.12_with_ft
      - cd ..
      - docker create --name dummy_temp_layer tritonserver_22.12_with_ft
      - mkdir -p docker_temp/
      - docker cp dummy_temp_layer:/opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so docker_temp/
      - docker cp dummy_temp_layer:/opt/tritonserver/backends/fastertransformer/libtransformer-shared.so docker_temp/
      - docker rm -f dummy_temp_layer
      - >-
        echo "
        FROM tritonserver:latest \n
        ENV NCCL_LAUNCH_MODE=GROUP \n
        RUN mkdir -p /opt/tritonserver/backends/fastertransformer \\ \n
        && chmod -R 777 /opt/tritonserver/backends/fastertransformer \\ \n
        && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends openssh-server  \\ \n
        && sed -i 's/#X11UseLocalhost yes/X11UseLocalhost no/g' /etc/ssh/sshd_config \\ \n
        && mkdir -p /var/run/sshd \n
        COPY libtriton_fastertransformer.so /opt/tritonserver/backends/fastertransformer/ \n
        COPY libtransformer-shared.so /opt/tritonserver/backends/fastertransformer/" >> docker_temp/Dockerfile
      - cat docker_temp/Dockerfile
      - docker build docker_temp/ -t tritonserver:final
      - rm -rf docker_temp

I was able to build the container with FT backend *.so from the 22.12 version into the 23.01 container.

Closing this ticket.

Why does it work for Tanmay and not Nikhil?

I was able to reproduce the error Nikhil is running into. I forgot to include fastertranformer in the build.py command before.

The error is occurring because CUBLASLT_MATMUL_PREF_EPILOGUE_MASK was removed in CUDA 12. But FasterTransformer is still using the API.

@nskool Let me get back to you with a working solution,