tensorflow: CUDA_ERROR_MISALIGNED_ADDRESS on MNIST example

Summary

What might be causing this error when running python tensorflow/models/image/mnist/convolutional.py?

E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS

Environment info

Operating System: Linux Lounge 4.5.6-200.fc23.x86_64 #1 SMP Wed Jun 1 21:28:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Installed version of CUDA and cuDNN: (please attach the output of ls -l /path/to/cuda/lib/libcud*): ls -l /usr/local/cuda-7.5/lib64/libcud* -rw-r–r–. 1 root root 322936 Aug 16 2015 /usr/local/cuda-7.5/lib64/libcudadevrt.a lrwxrwxrwx. 1 root root 16 Aug 16 2015 /usr/local/cuda-7.5/lib64/libcudart.so -> libcudart.so.7.5 lrwxrwxrwx. 1 root root 19 Aug 16 2015 /usr/local/cuda-7.5/lib64/libcudart.so.7.5 -> libcudart.so.7.5.18 -rwxr-xr-x. 1 root root 383336 Aug 16 2015 /usr/local/cuda-7.5/lib64/libcudart.so.7.5.18 -rw-r–r–. 1 root root 720192 Aug 16 2015 /usr/local/cuda-7.5/lib64/libcudart_static.a -rwxr-xr-x. 1 root root 61453024 Jun 11 12:35 /usr/local/cuda-7.5/lib64/libcudnn.so -rwxr-xr-x. 1 root root 61453024 Jun 11 12:35 /usr/local/cuda-7.5/lib64/libcudnn.so.4 -rwxr-xr-x. 1 root root 61453024 Jun 11 12:35 /usr/local/cuda-7.5/lib64/libcudnn.so.4.0.7 -rwxr-xr-x. 1 root root 59909104 Jun 11 12:35 /usr/local/cuda-7.5/lib64/libcudnn.so.5 -rwxr-xr-x. 1 root root 59909104 Jun 11 12:35 /usr/local/cuda-7.5/lib64/libcudnn.so.5.0.5 -rw-r–r–. 1 root root 62025862 Jun 11 12:35 /usr/local/cuda-7.5/lib64/libcudnn_static.a

If installed from binary pip package, provide:

1. Which pip package you installed.

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp27-none-linux_x86_64.whl pip install --upgrade $TF_BINARY_URL

2. The output from python -c "import tensorflow; print(tensorflow.__version__)". python -c “import tensorflow; print(tensorflow.version)” I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally

If installed from sources, provide the commit hash:

Steps to reproduce

1 python tensorflow/models/image/mnist/convolutional.py. 2. Observe errror CUDA_ERROR_MISALIGNED_ADDRESS 3. Scratch head

What have you tried?

  1. Searching the internet for clues, none found

Logs or other output that would be helpful

(If logs are large, please upload as attachment). Results of cuda-memcheck and dmesg error.txt

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 21 (7 by maintainers)

Most upvoted comments

From an offline conversation, we can confirm that this problem goes away:

  1. Build from source while explicitly setting 5.0 build target.
  2. Or install the latest graphics driver 367.27.

So it does seem like a JIT compiler issue that goes away the latest driver.

@johnfrombluff, @tsitsilin, @acowlikeobject, @kalleknast, @dzupin, @floringogianu, @MartianWearables, sorry that we cannot reproduce this problem on our side. I will try to guess where the problem is and see whether it could be fixed.

Among folks who encountered this problem, what is common is that all used gm107 and gm108 based GPUs. That is compute capability 5.0. TensorFlow binary by default carries compute capability 3.5 and 5.2. The Cuda driver will extract the compute 3.5 PTX and JIT compile into compute 5.0 SASS upon the first run. Given the error message is “Invalid local read of size 16”, my current guess is that the JIT compiler in the Cuda driver is generating wrong code for tf.nn.softmax on GPUs with compute capability 5.0.

Here are a number of things to try:

  1. Enable compute capability 5.0 directly when building from the source code. It is part of the “configure”. This would enable SASS 5.0 from the static Cuda compiler, and bypasses the JIT Cuda compiler in the Cuda driver.
  2. Install the latest driver from NVIDIA.

If #1 still fails, we can dump the SASS code from your binary and see what goes wrong.

I’ve run into the same problem exactly as described by floringogianu except w/ Ubuntu 16.04 and gcc 4.9. Also, i used the --override flag when installing cuda toolkit via the .run script, which may or may not be relevant. cifar10 runs fine.

To expand @zheng-xq fix:

edit: Updating the driver seems not to be that easy (see ask.SE question). @zheng-xq Could you please add some details how to build tensorflow setting it explicitly to 5.0? Is it possible to build tensorflow when one installed CUDA via apt-get (and thus does not have one cuda folder)?

I tried running the mnist example after I installed TensorFlow in virtualenv and I got the same error, Ubuntu 16, gcc 5.3.1, python 3.5.1, Driver Version: 361.42, cuda 7.5, this time with a GTX960 with 4GiB, which should be more than enough for this network model:

python -m tensorflow.models.image.mnist.convolutional

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.33GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
Initialized!
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
[1]    25066 abort (core dumped)  python -m tensorflow.models.image.mnist.convolutional

edit: Running cifar10 model seems to be working just fine…

@zheng-xq See the same error when running MNIST test. E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS

Also on Ubuntu 14.04, Cuda v7.5, Cudnn v4. Use the nvidia-docker using this image.

This is using a GTX 960M (use it for sanity checks before spinning up servers).

I’m calling via Keras MNIST example. Same example works fine using Theano backend (via Keras configuration).

cuda-memcheck.txt environment.txt

Sorry, I’m still confused. What block size are you referring to? Which file(s) should I look at to find what you’re talking about?

I’m trying to run example code that comes with the tensorflow distribution. Shouldn’t that code run on all supported architectures? Maybe GNU/Linux or my GPU is not supported, but I haven’t noticed that in the documentation?

And thank you for your attempt to help me!