tensorflow: Failed to memcpy from device to host: CUDA_ERROR_LAUNCH_TIMEOUT when running CIFAR10 after 113K steps

I tried to run the cifar10 model provided from TF with GPU support. I was able to successfully install Tensorflow from source (with GPU) and also was able to run the cifar10_train.py with utilizing my GPU. However, after step=113330, I encountered the following error which is probably related to async memcpy from device to host. As my graphic card compute capability is 5.2, I thought it should not be due to compute capability conflicts.

Similar issues

#1477 #1060

But my error is slightly different

Environment info

Operating System: Ubuntu 14.04 GPU: GM200 - GeForce GTX TITAN X (rev a1) Tensorflow 0.8 (installed from source) Installed version of CUDA and cuDNN: cuda 7.5 cudnn 7.0 (v4) using Anaconda virtual env

Logs

2016-04-25 22:10:14.937118: step 113330, loss = 0.74 (1276.4 examples/sec; 0.100 sec/batch)

E tensorflow/stream_executor/cuda/cuda_driver.cc:1197] failed to enqueue async memcpy from device to host: CUDA_ERROR_LAUNCH_TIMEOUT; host dst: 0x7fbdd0001680; GPU src: 0xb06c84600; size: 3=0x3
E tensorflow/stream_executor/cuda/cuda_driver.cc:1099] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_TIMEOUT :: No stack trace available
E tensorflow/stream_executor/stream.cc:272] Error recording event in stream: error recording CUDA event on stream 0x1efe980: CUDA_ERROR_LAUNCH_TIMEOUT; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_TIMEOUT
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed
E tensorflow/stream_executor/cuda/cuda_driver.cc:1099] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_TIMEOUT :: No stack trace available
E tensorflow/stream_executor/cuda/cuda_driver.cc:1099] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_TIMEOUT :: No stack trace available
E tensorflow/stream_executor/cuda/cuda_driver.cc:1099] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_TIMEOUT :: No stack trace available
I tensorflow/stream_executor/stream.cc:826] stream 0x1efe860 did not wait for stream: 0x1efd340
Aborted (core dumped)

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 15 (3 by maintainers)

Commits related to this issue

[ROCM] Add ROCM support to gpu_gcc.bazelrc, clang defines for rocm in… (#2117) * [ROCM] Add ROCM support to gpu_gcc.bazelrc, clang defines for rocm in gpu.bazelrc * [ROCM] Update rocm.bazelrc to p... — committed to fsx950223/tensorflow by jayfurmanek a year ago

Most upvoted comments

I get the same bug. 2017-07-13 22:05:28.022704: I tensorflow/stream_executor/stream.cc:1500] stream 0x5da97e0 did not wait for stream: 0x5da95b0 2017-07-13 22:05:28.022741: I tensorflow/stream_executor/stream.cc:4087] stream 0x5da97e0 did not memcpy host-to-device; source: 0x203861300 2017-07-13 22:05:28.022808: F tensorflow/core/common_runtime/gpu/gpu_util.cc:343] CPU->GPU Memcpy failed

+14

nyartsgnaw on Jul 14, 2017

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

adding those lines on top of my file right below the imports fixed this in my case.

Terkea on Apr 23, 2020

got a solution

cudnn64_6.dll is missing in the Toolkit - bin folder. You have to copy it there.

JSchwerdtner on Dec 14, 2018

the same 2018-12-11 00:31:09.918511: E tensorflow/stream_executor/cuda/cuda_driver.cc:1130] failed to enqueue async memcpy from host to device: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure; GPU dst: 0x4232ce0b00; host src: 0x7f1f10244780; size: 1597440=0x186000 2018-12-11 00:31:09.918525: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure 2018-12-11 00:31:09.918534: E tensorflow/stream_executor/cuda/cuda_driver.cc:1000] could not wait stream on event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure 2018-12-11 00:31:09.918534: E tensorflow/stream_executor/cuda/cuda_driver.cc:1000] could not wait stream on event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure 2018-12-11 00:31:09.918568: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1

shiyongde on Dec 11, 2018

My solution related to the fact I was using from numba import cuda cuda.select_device(0)

You have to use the latter line before interacting with tensorflow or keras at all.

nicholasg97 on Apr 11, 2019

@hamidb: Building from source is fine. Unfortunately, it sounds like this will be impossible to debug since it can’t be reproduced, so I’ll close for now. Please reopen if reproduction becomes possible!

girving on Jun 7, 2016