tensorflow: Tensorflow 2.0 is much slower than pytorch for large matrix assignment

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): tensorflow 2.0 beta
Python version: 3.6
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: CUDA 10.0/cuDNN 7
GPU model and memory: GeForce RTX 2080 Ti

Describe the current behavior In one of my research, I need to assign values to large sparse matrices. Since tensorflow does not support value assignment/update to large matrices , I have to use a lot of tf.stack()/tf.concat() functions.

I compared the same function implemented in tensorflow 2.0 and pytorch 1.1.0, and the execution time for tensorflow was much slower than pytorch. The execution times are listed below: pytorch: 0.0036 secs tf 2.0 : 0.1734 secs

Describe the expected behavior Is there a way to optimize the tensorflow codes to have comparable performance?

Code to reproduce the issue

## Tensorflow 2.0
import tensorflow as tf
import time
import numpy as np

def skew(x):
    x = tf.reshape(x, [-1, 1])
    z1 = tf.stack([tf.zeros(1), -x[2], x[1]], 1)
    z2 = tf.stack([x[2], tf.zeros(1), -x[0]], 1)
    z3 = tf.stack([-x[1], x[0], tf.zeros(1)], 1)
    X = tf.concat([z1, z2, z3], 0)
    return X

def propagate(Rot, v, p, g):
    v_skew_rot = tf.matmul(skew(v), Rot)
    p_skew_rot = tf.matmul(skew(p), Rot)

    F0 = tf.zeros([3, 3])
    F1 = tf.concat([F0, skew(g), F0], 0)
    F2 = tf.concat([F0, F0, tf.eye(3)], 0)
    F3 = tf.zeros([9, 3])
    F4 = tf.concat([-Rot, -v_skew_rot, -p_skew_rot], 0)
    F5 = tf.concat([F0, -Rot, F0], 0)
    F6 = tf.zeros([9, 6])
    F7 = tf.concat([F1, F2, F3, F4, F5, F6], 1)
    F = tf.concat([F7, tf.zeros([12, 21])], 0)

    G0 = tf.zeros([3, 12])
    G1 = tf.concat([Rot, v_skew_rot, p_skew_rot, tf.zeros([12, 3])], 0)
    G2 = tf.concat([F0, Rot, F0, tf.zeros([12, 3])], 0)
    G3 = tf.concat([G0, G0, G0, tf.eye(12)], 0)
    G = tf.concat([G1, G2, G3], 1)
    return F, G

if __name__ == '__main__':
    Rot = tf.eye(3)
    v = np.array([0.5, 0, 0], dtype=np.float32)
    p = np.array([1.5, 0, 0], dtype=np.float32)
    g = np.array([0, 0, -9.80655], dtype=np.float32)
    start = time.time()
    P = propagate(Rot, v, p, g)
    print("Propagate function takes {} secs".format(time.time() - start))

## pytorch 1.1.0
import torch
import time
import numpy as np


def skew(x):
    X = torch.Tensor([[0, -x[2], x[1]],
                      [x[2], 0, -x[0]],
                      [-x[1], x[0], 0]])
    return X

def propagate(Rot_prev, v_prev, p_prev, g):
    F = torch.zeros(21, 21)
    G = torch.zeros(21, 18)
    v_skew_rot = skew(v_prev).mm(Rot_prev)
    p_skew_rot = skew(p_prev).mm(Rot_prev)

    F[:3, 9:12] = -Rot_prev
    F[3:6, :3] = skew(g)
    F[6:9, 3:6] = torch.eye(3)
    F[3:6, 12:15] = -Rot_prev
    F[3:6, 9:12] = -v_skew_rot
    F[6:9, 9:12] = -p_skew_rot

    G[:3, :3] = Rot_prev
    G[3:6, 3:6] = Rot_prev
    G[3:6, :3] = v_skew_rot
    G[6:9, :3] = p_skew_rot
    G[9:12, 6:9] = torch.eye(3)
    G[12:15, 9:12] = torch.eye(3)
    G[15:18, 12:15] = torch.eye(3)
    G[18:21, 15:18] = torch.eye(3)
    return F, G

if __name__ == '__main__':
    Rot = torch.eye(3)
    v = np.array([0.5, 0, 0], dtype=np.float32)
    p = np.array([1.5, 0, 0], dtype=np.float32)
    g = np.array([0, 0, -9.80655], dtype=np.float32)
    start = time.time()
    P = propagate(Rot, v, p, g)
    print("Propagate function takes {} secs".format(time.time() - start))

tensorflow: Tensorflow 2.0 is much slower than pytorch for large matrix assignment

About this issue

Most upvoted comments