FasterTransformer: [FastTransformer v3.1/TensorFlow] Get CUBLAS_STATUS_INTERNAL_ERROR when run tensorflow/gpt2-sample.py
Related to FastTransformer v3.1/TensorFlow/GPT-2
Describe the bug
If I run ./bin/decoding_gemm 4 1 12 64 50257 32 768 0
before python tensorflow/gpt2_sample.py
, I got
Internal: [FT][ERROR] CUDA runtime error: CUBLAS_STATUS_INTERNAL_ERROR FasterTransformer/fastertransformer/cuda/open_decoder.cu:1708
.
However, If I don’t run ./bin/decoding_gemm 4 1 12 64 50257 32 768 0
before python tensorflow/gpt2_sample.py
(use the default gemm), everything is OK.
To Reproduce Steps to reproduce the behavior:
nvidia-docker run -it -v local_dir:container_dir nvcr.io/nvidia/tensorflow:19.06-py3 bash
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_TF=ON -DTF_PATH=/usr/local/lib/python3.5/dist-packages/tensorflow ..
make
./bin/decoding_gemm 4 1 12 64 50257 32 768 0
python tensorflow/gpt2_sample.py
Expected behavior There should be no error.
Environment Please provide at least:
- Container version: nvcr.io/nvidia/tensorflow:19.06-py3
- GPUs in the system: 8x Tesla V100-32GB
- CUDA driver version: 435.21
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17
Commits related to this issue
- use nvidia-smi to track mem usage (#12) — committed to ImanHosseini/FasterTransformer by yuanzhedong 3 years ago