tensorflow: Basic Element-wise Complex Number Calculations Not Available On GPU

Basic element-wise addition, subtraction, multiplication or division for any Tensor of type tf.complex64 is not implemented on GPU.

Environment info

Operating System: Centos 7, 3.10.0-327.22.2.el7.x86_64

Installed version of CUDA and cuDNN: CUDA 7.5 and cuDNN 7.0-v4 -rw-r–r–. 1 root root 189170 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudadevrt.a lrwxrwxrwx. 1 root root 16 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart.so -> libcudart.so.7.5 lrwxrwxrwx. 1 root root 19 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart.so.7.5 -> libcudart.so.7.5.18 -rwxr-xr-x. 1 root root 311596 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart.so.7.5.18 -rw-r–r–. 1 root root 558020 Jul 22 16:14 /usr/local/cuda-7.5/lib/libcudart_static.a

Tensorflow installed from source:

Commit hash 00700f00fdf71baec1342d1afd7849e16fbd2a33
Bazel information: Build label: 0.3.0-2016-07-22 (@ca36b06) Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Fri Jul 22 19:23:10 2016 (1469215390) Build timestamp: 1469215390 Build timestamp as int: 1469215390

Steps to reproduce

Add, subtract, multiply or divide any Tensor of type tf.complex64. A code example is shown here for element-wise addition:

import tensorflow as tf

if __name__ == '__main__':

    with tf.device('/gpu:0'):
        N = 100
        a = tf.complex(tf.random_normal((N,)), tf.random_normal((N,)))
        b = tf.complex(tf.random_normal((N,)), tf.random_normal((N,)))
        c = a + b

        with tf.Session() as sess:
            c = sess.run(c)

The code returns the following output if run on GPU (works well on CPU):

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.4.0.7 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.7.5 locally I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: Tesla K40c major: 3 minor: 5 memoryClockRate (GHz) 0.745 pciBusID 0000:02:00.0 Total memory: 12.00GiB Free memory: 11.90GiB W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x5168890 I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties: name: GeForce GT 610 major: 2 minor: 1 memoryClockRate (GHz) 1.62 pciBusID 0000:01:00.0 Total memory: 1023.19MiB Free memory: 396.98MiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 1 I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y N I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1: N Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0) I tensorflow/core/common_runtime/gpu/gpu_device.cc:814] Ignoring gpu device (device: 1, name: GeForce GT 610, pci bus id: 0000:01:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.5. E tensorflow/core/client/tensor_c_api.cc:485] Cannot assign a device to node ‘add’: Could not satisfy explicit device specification ‘/device:GPU:0’ because no supported kernel for GPU devices is available. [[Node: add = Add[T=DT_COMPLEX64, _device=“/device:GPU:0”](Complex, Complex_1)]] Traceback (most recent call last): File “test_div_gpu_prob.py”, line 12, in <module> c = sess.run© File “/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 382, in run run_metadata_ptr) File “/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 655, in _run feed_dict_string, options, run_metadata) File “/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 723, in _do_run target_list, options, run_metadata) File “/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py”, line 743, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node ‘add’: Could not satisfy explicit device specification ‘/device:GPU:0’ because no supported kernel for GPU devices is available. [[Node: add = Add[T=DT_COMPLEX64, _device=“/device:GPU:0”](Complex, Complex_1)]] Caused by op u’add’, defined at: File “test_div_gpu_prob.py”, line 9, in <module> c = a + b File “/usr/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py”, line 755, in binary_op_wrapper return func(x, y, name=name) File “/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py”, line 70, in add result = _op_def_lib.apply_op(“Add”, x=x, y=y, name=name) File “/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py”, line 703, in apply_op op_def=op_def) File “/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”, line 2310, in create_op original_op=self._default_original_op, op_def=op_def) File “/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”, line 1232, in init self._traceback = _extract_stack()

What have you tried?

Implementation using builtin Tensorflow functions works, if the real and imaginary parts are separated. See the code below:

import numpy as np
import tensorflow as tf

def complex_add(x, y):
    xr, xi = tf.real(x), tf.imag(x)
    yr, yi = tf.real(y), tf.imag(y)
    return tf.complex(xr + yr, xi + yi)

def complex_sub(x, y):
    xr, xi = tf.real(x), tf.imag(x)
    yr, yi = tf.real(y), tf.imag(y)
    return tf.complex(xr - yr, xi - yi)

def complex_mul(x, y):
    xr, xi = tf.real(x), tf.imag(x)
    yr, yi = tf.real(y), tf.imag(y)
    return tf.complex(xr*yr - xi*yi, xr*yi + xi*yr)

def complex_div(x, y):
    xr, xi = tf.real(x), tf.imag(x)
    yr, yi = tf.real(y), tf.imag(y)
    d = tf.square(yr) + tf.square(yi)
    return tf.complex((xr*yr+xi*yi)/d, (xi*yr-xr*yi)/d)

if __name__ == '__main__':

    with tf.device('/gpu:0'):
        N = 100
        a = tf.complex(tf.random_normal((N,)), tf.random_normal((N,)))
        b = tf.complex(tf.random_normal((N,)), tf.random_normal((N,)))

        with tf.Session() as sess:

            a_, b_, c = sess.run([a,b,complex_add(a,b)])
            assert np.allclose(c, a_ + b_)

            a_, b_, c = sess.run([a,b,complex_sub(a,b)])
            assert np.allclose(c, a_ - b_)

            a_, b_, c = sess.run([a,b,complex_mul(a,b)])
            assert np.allclose(c, a_ * b_)

            a_, b_, c = sess.run([a,b,complex_div(a,b)])
            assert np.allclose(c, a_ / b_)

It would be nice to have such functions transparent with the built-in CPU implementations.

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 31 (16 by maintainers)

Most upvoted comments

I made some progress! I can make multiplication and division ops work for complex numbers if I specialized the templates in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/cwise_ops.h#L432

template <typename T>
struct mul : base<T, Eigen::internal::scalar_product_op<T> > {};

template <typename T>
struct multiply_complex {
  typedef std::complex<T> result_type;
  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator()(std::complex<T> a,
                                                               std::complex<T> b) const {
    return std::complex<T>(a.real()*b.real() - a.imag()*b.imag(),
                           a.real()*b.imag() + a.imag()*b.real());
  }
};

template <>
struct mul<std::complex<float> > : base<std::complex<float>, multiply_complex<float> > {};

template <>
struct mul<std::complex<double> > : base<std::complex<double>, multiply_complex<double> > {};

It seems more like a hack, but it doesn’t involve changes in Eigen for now.

Not sure what is wrong with nvcc using scalar_product_op in Eigen for complex numbers: https://github.com/RLovelett/eigen/blob/master/Eigen/src/Core/functors/BinaryFunctors.h#L76

However, it seems tightly related to using built-in * and / operators for std:complex types. For instance, this fails with the same errors as in the previous posts:

template <typename T>
struct mul : base<T, Eigen::internal::scalar_product_op<T> > {};

template <typename T>
struct multiply_complex {
  typedef std::complex<T> result_type;
  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator()(std::complex<T> a,
                                                               std::complex<T> b) const {
    return a*b;
  }
};

template <>
struct mul<std::complex<float> > : base<std::complex<float>, multiply_complex<float> > {};

template <>
struct mul<std::complex<double> > : base<std::complex<double>, multiply_complex<double> > {};

sbrodeur on Aug 24, 2016

After adding a workaround to Eigen: https://bitbucket.org/eigen/eigen/commits/27f6140fa81c9fe83167d87e7aeb23031b42f344

We were able to enable addition, subtraction, division, and multiplication kernels for complex types on GPU: https://github.com/tensorflow/tensorflow/commit/93f15d4cde2f08057819f1194e5a4771f0d391ff

rryan on Sep 28, 2016

Thanks for the information @ibab! I will attempt a fix myself and send a PR soon!

sbrodeur on Aug 8, 2016