tensorflow: from_dlpack unable to process arrays with column-major strides

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.3.0
  • Python version: 3.6.9/3.7
  • CUDA/cuDNN version: 10.1/7
  • GPU model and memory: K80/V100-32GB

Describe the current behavior Trying to pass dlpack capsules to TensorFlow that come from arrays with column-major strides, like those leveraged by CuDF, raises an InvalidArgumentError. It’s possible to work around this by transposing the array first, passing it to TensorFlow via dlpack, then transposing back, but obviously this is less than ideal.

Describe the expected behavior These arrays should be read properly without needing to transpose first.

Standalone code to reproduce the issue

import tensorflow as tf, cupy as cp

# initialize tf
x = tf.random.uniform((1,))

tf.experimental.dlpack.from_dlpack(cp.asfortranarray(cp.ones((5, 2))).toDlpack())

Output should be

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-4-cb770ae77ab3> in <module>()
      1 import tensorflow as tf, cupy as cp
      2 x = tf.random.uniform((5,))
----> 3 tf.experimental.dlpack.from_dlpack(cp.asfortranarray(cp.ones((5, 2))).toDlpack())

/usr/local/lib/python3.6/dist-packages/tensorflow/python/dlpack/dlpack.py in from_dlpack(dlcapsule)
     64     A Tensorflow eager tensor
     65   """
---> 66   return pywrap_tfe.TFE_FromDlpackCapsule(dlcapsule, context.context()._handle)

InvalidArgumentError: Invalid strides array from DLPack

See notebook here


@miguelusque for viz

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 24 (8 by maintainers)

Most upvoted comments

Hey, cuDF maintainer here and we produce a column major dlpack object since it makes the most sense given a columnar DataFrame.

I’m slightly worried that this might break the (implicit) promise that transferring tensors via dlpack is “free”, but I’m happy to be corrected.

Regardless of whether the transpose happens at the source or destination this promise is broken. I guess you could argue that doing it at the source is the user opting in to breaking that promise more explicitly, but most other DLPack adopters (CuPy, PyTorch, MXNet, etc.) all either support a column major memory layout or implicitly handle the transpose.

I don’t think it’s a good idea to make it happen in the from_dlpack function. I would prefer it to be an option in the to_dlpack in cupy or other library, such as force_canonical_layouts. Since transpose seems always needed, I would prefer it happening at the source framework side.

I’m against this, because not every producer has a transpose implementation or wants to add a transpose implementation. I.E. cuDF will produce a column major DLPack object via copy since our columns are individual allocations normally, and we don’t want to transpose the dataframe since that’s extremely expensive and implementing an array transpose is out of scope for a dataframe library. This will then force users to hop through something like CuPy to do the array transpose, which is non-obvious for non-power users.