FasterTransformer: [FastTransformer v3.1/TensorFlow] Get CUBLAS_STATUS_INTERNAL_ERROR when run tensorflow/gpt2-sample.py

Related to FastTransformer v3.1/TensorFlow/GPT-2

Describe the bug If I run ./bin/decoding_gemm 4 1 12 64 50257 32 768 0 before python tensorflow/gpt2_sample.py, I got Internal: [FT][ERROR] CUDA runtime error: CUBLAS_STATUS_INTERNAL_ERROR FasterTransformer/fastertransformer/cuda/open_decoder.cu:1708. However, If I don’t run ./bin/decoding_gemm 4 1 12 64 50257 32 768 0 before python tensorflow/gpt2_sample.py (use the default gemm), everything is OK.

To Reproduce Steps to reproduce the behavior:

nvidia-docker run -it -v local_dir:container_dir nvcr.io/nvidia/tensorflow:19.06-py3 bash
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_TF=ON -DTF_PATH=/usr/local/lib/python3.5/dist-packages/tensorflow ..
make
./bin/decoding_gemm 4 1 12 64 50257 32 768 0
python tensorflow/gpt2_sample.py

Expected behavior There should be no error.

Environment Please provide at least:

Container version: nvcr.io/nvidia/tensorflow:19.06-py3
GPUs in the system: 8x Tesla V100-32GB
CUDA driver version: 435.21

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 17

Commits related to this issue

use nvidia-smi to track mem usage (#12) — committed to ImanHosseini/FasterTransformer by yuanzhedong 3 years ago

FasterTransformer: [FastTransformer v3.1/TensorFlow] Get CUBLAS_STATUS_INTERNAL_ERROR when run tensorflow/gpt2-sample.py

About this issue

Commits related to this issue

Most upvoted comments