whisper.cpp: CUDA an illegal memory access was encountered
When I try to start any large model I have this error:
CUDA error 700 at ggml-cuda.cu:8303: an illegal memory access was encountered
my GPU is 1080 Ti (11GB vRAM). model base.en works.
openai-whisper with large-v2 works without problem using GPU.
I also tried to run quantize model… q5_0 have same problem… q4_0 start but I don’t have any output 😕
on dmesg:
[12337.297750] NVRM: Xid (PCI:0000:04:00): 31, pid=167607, name=main, Ch 00000008, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_8 faulted @ 0x7ffb_7fe03000. Fault is of type FAULT_PDE ACCESS_TYPE_READ
example run:
# ./main -debug -m models/ggml-large-v3.bin -l pl janina.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-large-v3.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1080 Ti, compute capability 6.1
whisper_backend_init: using CUDA backend
whisper_model_load: CUDA buffer size = 3117.87 MB
whisper_model_load: model size = 3117.39 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size = 220.20 MB
whisper_init_state: kv cross size = 245.76 MB
whisper_init_state: compute buffer (conv) = 32.36 MB
whisper_init_state: compute buffer (encode) = 212.36 MB
whisper_init_state: compute buffer (cross) = 9.32 MB
whisper_init_state: compute buffer (decode) = 99.17 MB
system_info: n_threads = 4 / 30 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |
main: processing 'janina.wav' (50387302 samples, 3149.2 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = pl, task = transcribe, timestamps = 1 ...
CUDA error 700 at ggml-cuda.cu:8303: an illegal memory access was encountered
current device: 0
About this issue
- Original URL
- State: open
- Created 7 months ago
- Reactions: 4
- Comments: 22 (12 by maintainers)
That fixed the issue for my GTX 1060. Thanks!
Any luck with the latest version on
master? There have been some changes to the CUDA backend which might have fixed the issueTesting on Kaggle with Nvidia P100
Latest commit as of today i.e. - f0efd02
The issue seems to affect all models probably (tested large-v3, large-v2, medium, base, tiny) in different ways. For the large models it gives CUDA 700 error and for the smaller models its gives gibberish output.
I tested in Kaggle so that others without access to older gen GPUs can also reproduce the issue.
For model - large-v3
Same result for large-v2
The error is not present in medium, base and tiny models but gibberish outputs all cases -
Medium model-
Base model -
Tiny model -
I am hitting this with large-v1 in f16 and q5_1:
I do not hit this with q8_0 but I get gibberish. This model works fine on 4774d2fe (with -arch=native for modern CUDA). Will bisect.
edit: ec7a6f04f9c32adec2e6b0995b8c728c5bf56f35 works fine on my P40. b0502836b82944d444f30ea1b5217f69ff6da71b (#1472) either hangs (apparently doing nothing with 100% GPU usage), or crashes with the above assertion failure on line 8224.
Yeah, I’m sorry. Without being able to reproduce, I won’t be able to fix it. It works on my old GTX 1660 and I can’t rent an older GPU to test
./main -m models/ggml-large-v3.bin -f samples/jfk.wav