tensorflow: Creating ragged tensors is incredibly slow
I’m running the following code on a Google Colab instance with GPU support enabled:
tf.ragged.constant(np.random.randint(10, size = 10_000_000))
The code takes 15 seconds to finish. In comparison, tf.constant(np.random.randint(10, size = 10_000_000))
takes only 50 milliseconds to finish.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 3
- Comments: 15 (6 by maintainers)
@deepakagrawal: +1 to using the
RaggedTensor
factory methods. In your example, you could convert the numpy arrays into a RaggedTensor usingRaggedTensor.from_row_lengths
as follows:It looks like most of the time is being spent in
_find_scalar_and_max_depth
. Two options to speed up that function:np.ndim(pylist) != 0
to(isinstance(pylist, np.ndarray) and np.ndim(pylist) != 0
– callingndim
seems to take up a fair amount of time. Not sure if this will break any existing use-cases.But even with these fixes,
tf.ragged.constant
will probably still be significantly slower thantf.constant
(especially if the input is a numpy array), sincetf.constant
is implemented in c andtf.ragged.constant
is implemented in Python.If you want to ingest ragged large tensors quickly, and it’s possible to get them into a format other than nested python lists, then you might consider using one of the following:
RaggedTensor
factory methods, such astf.ragged.from_row_splits
ortf.ragged.from_row_lengths
. (Usevalidate=False
to prevent validation ops from being added to the graph, which can slow things down.)RaggedTensor.from_tensor
method, which has optionalpadding
andlengths
arguments you can use to specify which values are padding.tf.ragged.stack(list_of_numpy_arrays)
.All of these options will be substantially faster than
tf.ragged.constant
.What is the actual shape of ragged tensors you want to ingest? This could impact what approach will be best for ingesting them. Examples:
dense
andragged
are both very large.dense_1
andragged_1
are large, butragged_2
anddense_2
are small.(In the example you used for timing, you pass in a 1D list, which can’t even be ragged – RaggedTensors always have rank>1.)
I was able to reproduce the issue on tf-nightly 2.10.0-dev20220719. Kindly find the gist of it here. Thank you!
Hi @deepakagrawal, Try using factory class-methods such as:
See here for further reference https://www.tensorflow.org/guide/ragged_tensor#constructing_a_ragged_tensor
Below is an example of using
tf.RaggedTensor.from_row_splits
to convert a(100000, 1000)
numpy-array into a(1000, None, 1000)
ragged-tensor. https://colab.research.google.com/drive/1WnyvzoB8oE5vGXq5iYHJAAv7RQDADVU8?usp=sharingtf.ragged.stack
works fast for me.Time: