FasterTransformer: [FastTransformer v3.1/TensorFlow] Get CUBLAS_STATUS_INTERNAL_ERROR when run tensorflow/gpt2-sample.py

Related to FastTransformer v3.1/TensorFlow/GPT-2

Describe the bug If I run ./bin/decoding_gemm 4 1 12 64 50257 32 768 0 before python tensorflow/gpt2_sample.py, I got Internal: [FT][ERROR] CUDA runtime error: CUBLAS_STATUS_INTERNAL_ERROR FasterTransformer/fastertransformer/cuda/open_decoder.cu:1708. However, If I don’t run ./bin/decoding_gemm 4 1 12 64 50257 32 768 0 before python tensorflow/gpt2_sample.py (use the default gemm), everything is OK.

To Reproduce Steps to reproduce the behavior:

  1. nvidia-docker run -it -v local_dir:container_dir nvcr.io/nvidia/tensorflow:19.06-py3 bash
  2. cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_TF=ON -DTF_PATH=/usr/local/lib/python3.5/dist-packages/tensorflow ..
  3. make
  4. ./bin/decoding_gemm 4 1 12 64 50257 32 768 0
  5. python tensorflow/gpt2_sample.py

Expected behavior There should be no error.

Environment Please provide at least:

  • Container version: nvcr.io/nvidia/tensorflow:19.06-py3
  • GPUs in the system: 8x Tesla V100-32GB
  • CUDA driver version: 435.21

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17

Commits related to this issue

Most upvoted comments