tensorflow: Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS
Summary:
Trying inceptionv3, was working fine all the way until I downgraded gcc 5+ to gcc4.9 to use Theano with keras: following this example http://deeplearning.net/software/theano/install_ubuntu.html
Now hitting this error before training starts (bottlenecks generate fine) whenever i run
bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir <image dir>
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
Cant figure out the problem. Sidenote that might help: bottlenecks generated alot faster when i used gcc 4.9 instead, but now the training crashes and i cant even run.
Environment info
Operating System: Ubuntu 16.04
Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):
ls: cannot access '/path/to/cuda/lib/libcud*': No such file or directory
It’s installed in /usr/local/cuda and /usr/local/cuda-7.5 instead.
CUDA 7.5, CuDNN v4.
Install steps:
CUDA:
bash cuda_7.5.18_linux.run --override
CUDNN:
Tried both:
tar xvzf cudnn-7.0-linux-x64-v4.0-prod.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda-7.5/include
sudo cp -r cuda/lib64/. /usr/local/cuda-7.5/lib64
and from here: http://askubuntu.com/questions/767269/how-can-i-install-cudnn-on-ubuntu-16-04
If installed from binary pip package, provide:
- Which pip package you installed.
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0-cp35-cp35m-linux_x86_64.whl
pip install --upgrade $TF_BINARY_URL
- The output from
python -c "import tensorflow; print(tensorflow.__version__)".
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
0.9.0
If installed from sources, provide the commit hash:
Steps to reproduce
- bazel build -c opt --copt=-mavx tensorflow/examples/image_retraining:retrain
- bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir <direc>
- scratch head
What have you tried?
1.literally every other stack overflow / github question. eg, https://github.com/tensorflow/tensorflow/issues/2810
2.reinstalling cuda 7.5 and cudnn v4, running ./configure. no luck.
Logs or other output that would be helpful
(If logs are large, please upload as attachment).
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 18 (10 by maintainers)
I have meet same error when using official gRPC protocal.