tensorflow: Creating ragged tensors is incredibly slow

I’m running the following code on a Google Colab instance with GPU support enabled:

tf.ragged.constant(np.random.randint(10, size = 10_000_000))

The code takes 15 seconds to finish. In comparison, tf.constant(np.random.randint(10, size = 10_000_000)) takes only 50 milliseconds to finish.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 3
  • Comments: 15 (6 by maintainers)

Most upvoted comments

@deepakagrawal: +1 to using the RaggedTensor factory methods. In your example, you could convert the numpy arrays into a RaggedTensor using RaggedTensor.from_row_lengths as follows:

import numpy as np
np_arrays = [np.zeros([i, 1000]) for i in range(1000)]  # example array.

ragged_tensor = tf.RaggedTensor.from_row_lengths(
    values=tf.concat(np_arrays, axis=0),
    row_lengths=[a.shape[0] for a in np_arrays])

It looks like most of the time is being spent in _find_scalar_and_max_depth. Two options to speed up that function:

  1. short-circuit and return early once we find any scalar value. (But this may prevent us from giving good error messages if the user supplies an input with scalar values at different nesting depths.)
  2. change np.ndim(pylist) != 0 to (isinstance(pylist, np.ndarray) and np.ndim(pylist) != 0 – calling ndim seems to take up a fair amount of time. Not sure if this will break any existing use-cases.

But even with these fixes, tf.ragged.constant will probably still be significantly slower than tf.constant (especially if the input is a numpy array), since tf.constant is implemented in c and tf.ragged.constant is implemented in Python.

If you want to ingest ragged large tensors quickly, and it’s possible to get them into a format other than nested python lists, then you might consider using one of the following:

  • The RaggedTensor factory methods, such as tf.ragged.from_row_splits or tf.ragged.from_row_lengths. (Use validate=False to prevent validation ops from being added to the graph, which can slow things down.)
  • The RaggedTensor.from_tensor method, which has optional padding and lengths arguments you can use to specify which values are padding.
  • If you have a list of numpy arrays (for each row), then you could use tf.ragged.stack(list_of_numpy_arrays).

All of these options will be substantially faster than tf.ragged.constant.

What is the actual shape of ragged tensors you want to ingest? This could impact what approach will be best for ingesting them. Examples:

  • a 2d tensor [dense, ragged] where dense and ragged are both very large.
  • a 3d tensor [dense_1, ragged_1, ragged_2, dense_2], where dense_1 and ragged_1 are large, but ragged_2 and dense_2 are small.

(In the example you used for timing, you pass in a 1D list, which can’t even be ragged – RaggedTensors always have rank>1.)

I was able to reproduce the issue on tf-nightly 2.10.0-dev20220719. Kindly find the gist of it here. Thank you!

I have a list of 2d numpy arrays (m,n) where m is variable (order of 1000) and n is fixed (say 1000). I have around 10000 such arrays. The final ragged tensor should be of shape (10000, None, 1000). RaggedTensor.stack is fast however it results into a ragged tensor of ragged_rank=2, shape=(10000, None, None). How can I convert this into a tensor which has a ragged rank of 1. RaggedTensor.constant gives correct results, however it is incredibly slow.

Hi @deepakagrawal, Try using factory class-methods such as:

See here for further reference https://www.tensorflow.org/guide/ragged_tensor#constructing_a_ragged_tensor

Below is an example of using tf.RaggedTensor.from_row_splits to convert a (100000, 1000) numpy-array into a (1000, None, 1000) ragged-tensor. https://colab.research.google.com/drive/1WnyvzoB8oE5vGXq5iYHJAAv7RQDADVU8?usp=sharing

CPU times: user 16.5 ms, sys: 18.6 ms, total: 35.1 ms
Wall time: 39.7 ms

tf.ragged.stack works fast for me.

tf.ragged.stack([np.random.rand(ii+9999) for ii in range(9999)])

Time:

CPU times: user 2.3 s, sys: 3.49 s, total: 5.79 s
Wall time: 3.31 s