tensorflow: Basic Element-wise Complex Number Calculations Not Available On GPU
Basic element-wise addition, subtraction, multiplication or division for any Tensor of type tf.complex64 is not implemented on GPU.
Environment info
Operating System: Centos 7, 3.10.0-327.22.2.el7.x86_64
Installed version of CUDA and cuDNN: CUDA 7.5 and cuDNN 7.0-v4 -rw-r–r–. 1 root root 189170 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudadevrt.a lrwxrwxrwx. 1 root root 16 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart.so -> libcudart.so.7.5 lrwxrwxrwx. 1 root root 19 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart.so.7.5 -> libcudart.so.7.5.18 -rwxr-xr-x. 1 root root 311596 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart.so.7.5.18 -rw-r–r–. 1 root root 558020 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart_static.a
Tensorflow installed from source:
- Commit hash 00700f00fdf71baec1342d1afd7849e16fbd2a33
- Bazel information: Build label: 0.3.0-2016-07-22 (@ca36b06) Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Fri Jul 22 19:23:10 2016 (1469215390) Build timestamp: 1469215390 Build timestamp as int: 1469215390
Steps to reproduce
- Add, subtract, multiply or divide any Tensor of type tf.complex64. A code example is shown here for element-wise addition:
import tensorflow as tf
if __name__ == '__main__':
with tf.device('/gpu:0'):
N = 100
a = tf.complex(tf.random_normal((N,)), tf.random_normal((N,)))
b = tf.complex(tf.random_normal((N,)), tf.random_normal((N,)))
c = a + b
with tf.Session() as sess:
c = sess.run(c)
The code returns the following output if run on GPU (works well on CPU):
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.4.0.7 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.7.5 locally I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: Tesla K40c major: 3 minor: 5 memoryClockRate (GHz) 0.745 pciBusID 0000:02:00.0 Total memory: 12.00GiB Free memory: 11.90GiB W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x5168890 I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties: name: GeForce GT 610 major: 2 minor: 1 memoryClockRate (GHz) 1.62 pciBusID 0000:01:00.0 Total memory: 1023.19MiB Free memory: 396.98MiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 1 I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y N I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1: N Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0) I tensorflow/core/common_runtime/gpu/gpu_device.cc:814] Ignoring gpu device (device: 1, name: GeForce GT 610, pci bus id: 0000:01:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.5. E tensorflow/core/client/tensor_c_api.cc:485] Cannot assign a device to node ‘add’: Could not satisfy explicit device specification ‘/device:GPU:0’ because no supported kernel for GPU devices is available. [[Node: add = Add[T=DT_COMPLEX64, _device=“/device:GPU:0”](Complex, Complex_1)]] Traceback (most recent call last): File “test_div_gpu_prob.py”, line 12, in <module> c = sess.run© File “/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 382, in run run_metadata_ptr) File “/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 655, in _run feed_dict_string, options, run_metadata) File “/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 723, in _do_run target_list, options, run_metadata) File “/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 743, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node ‘add’: Could not satisfy explicit device specification ‘/device:GPU:0’ because no supported kernel for GPU devices is available. [[Node: add = Add[T=DT_COMPLEX64, _device=“/device:GPU:0”](Complex, Complex_1)]] Caused by op u’add’, defined at: File “test_div_gpu_prob.py”, line 9, in <module> c = a + b File “/usr/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py”, line 755, in binary_op_wrapper return func(x, y, name=name) File “/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py”, line 70, in add result = _op_def_lib.apply_op(“Add”, x=x, y=y, name=name) File “/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py”, line 703, in apply_op op_def=op_def) File “/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”, line 2310, in create_op original_op=self._default_original_op, op_def=op_def) File “/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”, line 1232, in init self._traceback = _extract_stack()
What have you tried?
- Implementation using builtin Tensorflow functions works, if the real and imaginary parts are separated. See the code below:
import numpy as np
import tensorflow as tf
def complex_add(x, y):
xr, xi = tf.real(x), tf.imag(x)
yr, yi = tf.real(y), tf.imag(y)
return tf.complex(xr + yr, xi + yi)
def complex_sub(x, y):
xr, xi = tf.real(x), tf.imag(x)
yr, yi = tf.real(y), tf.imag(y)
return tf.complex(xr - yr, xi - yi)
def complex_mul(x, y):
xr, xi = tf.real(x), tf.imag(x)
yr, yi = tf.real(y), tf.imag(y)
return tf.complex(xr*yr - xi*yi, xr*yi + xi*yr)
def complex_div(x, y):
xr, xi = tf.real(x), tf.imag(x)
yr, yi = tf.real(y), tf.imag(y)
d = tf.square(yr) + tf.square(yi)
return tf.complex((xr*yr+xi*yi)/d, (xi*yr-xr*yi)/d)
if __name__ == '__main__':
with tf.device('/gpu:0'):
N = 100
a = tf.complex(tf.random_normal((N,)), tf.random_normal((N,)))
b = tf.complex(tf.random_normal((N,)), tf.random_normal((N,)))
with tf.Session() as sess:
a_, b_, c = sess.run([a,b,complex_add(a,b)])
assert np.allclose(c, a_ + b_)
a_, b_, c = sess.run([a,b,complex_sub(a,b)])
assert np.allclose(c, a_ - b_)
a_, b_, c = sess.run([a,b,complex_mul(a,b)])
assert np.allclose(c, a_ * b_)
a_, b_, c = sess.run([a,b,complex_div(a,b)])
assert np.allclose(c, a_ / b_)
It would be nice to have such functions transparent with the built-in CPU implementations.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 31 (16 by maintainers)
I made some progress! I can make multiplication and division ops work for complex numbers if I specialized the templates in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/cwise_ops.h#L432
It seems more like a hack, but it doesn’t involve changes in Eigen for now.
Not sure what is wrong with nvcc using scalar_product_op in Eigen for complex numbers: https://github.com/RLovelett/eigen/blob/master/Eigen/src/Core/functors/BinaryFunctors.h#L76
However, it seems tightly related to using built-in * and / operators for std:complex types. For instance, this fails with the same errors as in the previous posts:
After adding a workaround to Eigen: https://bitbucket.org/eigen/eigen/commits/27f6140fa81c9fe83167d87e7aeb23031b42f344
We were able to enable addition, subtraction, division, and multiplication kernels for complex types on GPU: https://github.com/tensorflow/tensorflow/commit/93f15d4cde2f08057819f1194e5a4771f0d391ff
Thanks for the information @ibab! I will attempt a fix myself and send a PR soon!