tensorflow: Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS

Summary:

Trying inceptionv3, was working fine all the way until I downgraded gcc 5+ to gcc4.9 to use Theano with keras: following this example http://deeplearning.net/software/theano/install_ubuntu.html

Now hitting this error before training starts (bottlenecks generate fine) whenever i run

bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir <image dir>

E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1

Cant figure out the problem. Sidenote that might help: bottlenecks generated alot faster when i used gcc 4.9 instead, but now the training crashes and i cant even run.

Environment info

Operating System: Ubuntu 16.04

Installed version of CUDA and cuDNN: (please attach the output of ls -l /path/to/cuda/lib/libcud*): ls: cannot access '/path/to/cuda/lib/libcud*': No such file or directory It’s installed in /usr/local/cuda and /usr/local/cuda-7.5 instead.

CUDA 7.5, CuDNN v4.

Install steps: CUDA: bash cuda_7.5.18_linux.run --override CUDNN: Tried both:


tar xvzf cudnn-7.0-linux-x64-v4.0-prod.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda-7.5/include
sudo cp -r cuda/lib64/. /usr/local/cuda-7.5/lib64

and from here: http://askubuntu.com/questions/767269/how-can-i-install-cudnn-on-ubuntu-16-04

If installed from binary pip package, provide:

  1. Which pip package you installed.
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0-cp35-cp35m-linux_x86_64.whl
pip install --upgrade $TF_BINARY_URL
  1. The output from python -c "import tensorflow; print(tensorflow.__version__)".
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
0.9.0

If installed from sources, provide the commit hash:

Steps to reproduce

  1. bazel build -c opt --copt=-mavx tensorflow/examples/image_retraining:retrain
  2. bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir <direc>
  3. scratch head

What have you tried?

1.literally every other stack overflow / github question. eg, https://github.com/tensorflow/tensorflow/issues/2810

2.reinstalling cuda 7.5 and cudnn v4, running ./configure. no luck.

Logs or other output that would be helpful

(If logs are large, please upload as attachment).

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 18 (10 by maintainers)

Most upvoted comments

I have meet same error when using official gRPC protocal.