io: tf.data doesn't work with AudioIOTensor
AFAIK tf.data always applies @tf.function to everything for performance reasons (e.g. distributed compute). This means that tfio.IOTensor.from_audio cannot be used in a tf.data pipeline as the constructor calls .numpy(). Am I misunderstanding how tfio.IOTensor is intended to be used or is this a bug?
MCVE
import tensorflow_io as tfio
def load(path):
wav = tfio.IOTensor.from_audio(path)
offset = int(1.0 * wav.rate)
duration = int(3.0 * wav.rate)
return wav[start:start + offset].to_tensor()
tf.data.Dataset.from_tensor_slices(["my_audio.wav"]).map(load)
AttributeError: in converted code:
<ipython-input-87-7e40b48025cf>:5 load *
wav = tfio.IOTensor.from_audio(path)
/usr/local/lib/python3.6/dist-packages/tensorflow_io/core/python/ops/io_tensor.py:243 from_audio *
return audio_io_tensor_ops.AudioIOTensor(filename, internal=True) # pylint: disable=protected-access
/usr/local/lib/python3.6/dist-packages/tensorflow_io/core/python/ops/audio_io_tensor_ops.py:68 __init__
shape = tf.TensorShape(shape.numpy())
AttributeError: 'Tensor' object has no attribute 'numpy'
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 15 (9 by maintainers)
To explain a little about what was happening and why it could be challenging:
for d in datasettypes of iteration was added but that is more of a complement.To give an example about why item 2) mentioned above could be an issue: as many in this issue thread have noticed, if you have a big audio file with GB’s of data, you have to read all the way from the beginning to end even if you only need to take, say the very last 5 min of the data.
To give an example about why item 3) mentioned above could be an issue: if you have an audio file of wav and you don’t know what is inside, you will not able to find out the true data type. Is the WAV file 16 bit or 24 bit? Unless you have another program open and check the metadata, you will not be able to know it. And without dtype, you have to either
try and see, oralways promote to 32bit/etc.Some of the goals of tensorflow/io were to address 2) and 3).
In order to address 2), we introduced C++ kernel ops that allows you to read based on a [start:stop] chunk of the data, thus that forms the initial idea about IOTensor.
In order to address 3), we utilize eager mode to obtain the metadata in one run, then run the graph in another run. This essentially eliminate the need for user to provide any information other than data type. This is also part of IOTensor.
However, as we introduce the eager mode (need first run to get dtype, then second run to construct the graph), it is not possible to run IOTensor in graph mode any more.
Now with PR #615, we separated two scenarios:
AudioIOTensoris not possible to run in graph mode (thus not embedded in tf.data.Dataset pipeline).IODataset/GraphIODataset is in similar situation.
If user provides data type, then we could construct the graph without actually run, thus GraphIODataset could be feed into tf.data.Dataset (note this is like Dataset inside outer Dataset).
If user does not have any information about the data type, then data type has to be probed in one run first, to construct the IODataset. This IODataset will not be able to run inside another tf.data.Dataset.
Here is the usage of example of enabling “graph mode for IOTensor”, and “graph mode for IODataset”:
Below is the example of running GraphIOTensor inside tf.data.Dataset’s map:
Below is the example of running GraphIODataset inside tf.data.Dataset’s map, essentially a “dataset inside another dataset graph”:
@yongtang reading your suggestions I feel that most users would probably like a
decode_audioop that they can integrate in theirtf.datapipeline. Is that possible?