io: tf.data doesn't work with AudioIOTensor

AFAIK tf.data always applies @tf.function to everything for performance reasons (e.g. distributed compute). This means that tfio.IOTensor.from_audio cannot be used in a tf.data pipeline as the constructor calls .numpy(). Am I misunderstanding how tfio.IOTensor is intended to be used or is this a bug?

MCVE

import tensorflow_io as tfio


def load(path):
    wav = tfio.IOTensor.from_audio(path)
    offset = int(1.0 * wav.rate)
    duration = int(3.0 * wav.rate)
    return wav[start:start + offset].to_tensor()


tf.data.Dataset.from_tensor_slices(["my_audio.wav"]).map(load)
AttributeError: in converted code:

    <ipython-input-87-7e40b48025cf>:5 load  *
        wav = tfio.IOTensor.from_audio(path)
    /usr/local/lib/python3.6/dist-packages/tensorflow_io/core/python/ops/io_tensor.py:243 from_audio  *
        return audio_io_tensor_ops.AudioIOTensor(filename, internal=True) # pylint: disable=protected-access
    /usr/local/lib/python3.6/dist-packages/tensorflow_io/core/python/ops/audio_io_tensor_ops.py:68 __init__
        shape = tf.TensorShape(shape.numpy())

AttributeError: 'Tensor' object has no attribute 'numpy'

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 15 (9 by maintainers)

Most upvoted comments

To explain a little about what was happening and why it could be challenging:

  1. Initially tf.data was designed as a pipeline in graph mode with efficiency as the goal. The usage of tf.data.Dataset in eager mode was outside of the initial goal. Later on for d in dataset types of iteration was added but that is more of a complement.
  2. It was not possible to random access with tf.data as tf.data always iterate from beginning to end.
  3. It was not possible to get meta data as graph mode essentially is a one-shot graph operation and all of the information about shape and dtype has to be defined exactly before hand.

To give an example about why item 2) mentioned above could be an issue: as many in this issue thread have noticed, if you have a big audio file with GB’s of data, you have to read all the way from the beginning to end even if you only need to take, say the very last 5 min of the data.

To give an example about why item 3) mentioned above could be an issue: if you have an audio file of wav and you don’t know what is inside, you will not able to find out the true data type. Is the WAV file 16 bit or 24 bit? Unless you have another program open and check the metadata, you will not be able to know it. And without dtype, you have to either try and see, or always promote to 32bit/etc.

Some of the goals of tensorflow/io were to address 2) and 3).

In order to address 2), we introduced C++ kernel ops that allows you to read based on a [start:stop] chunk of the data, thus that forms the initial idea about IOTensor.

In order to address 3), we utilize eager mode to obtain the metadata in one run, then run the graph in another run. This essentially eliminate the need for user to provide any information other than data type. This is also part of IOTensor.

However, as we introduce the eager mode (need first run to get dtype, then second run to construct the graph), it is not possible to run IOTensor in graph mode any more.

Now with PR #615, we separated two scenarios:

  1. User provides the data type before hand. In that situation, we could infer everything else and construct the graph in one pass. Thus it is possible to run “AudioGraphIOTensor” in graph mode.
  2. User does not know anything about what is inside a file other than “it is a WAV file of any bits”. In that case, user still could use the “AudioIOTensor” in eager mode to get everything. The down side is that the AudioIOTensor is not possible to run in graph mode (thus not embedded in tf.data.Dataset pipeline).

IODataset/GraphIODataset is in similar situation.

If user provides data type, then we could construct the graph without actually run, thus GraphIODataset could be feed into tf.data.Dataset (note this is like Dataset inside outer Dataset).

If user does not have any information about the data type, then data type has to be probed in one run first, to construct the IODataset. This IODataset will not be able to run inside another tf.data.Dataset.

Here is the usage of example of enabling “graph mode for IOTensor”, and “graph mode for IODataset”:

import tensorflow_io as tfio

# create Graph mode IOTensor, could be run inside tf.data
# but tf.int32 is necessary and could not be avoided.
tfio.IOTensor.graph(tf.int32).from_audio(filename) 

# create eager mode IOTensor, could not run inside tf.data
tfio.IOTensor.from_audio(filename) 

# create Graph mode IODataset, could be run inside tf.data
# but user has to indicate the output data type of tf.int32
tfio.IODataset.graph(tf.int32).from_audio(filename) 

# create eager mode IODataset, not need for tf.int32 input
# but could not run inside tf.data
tfio.IODataset.from_audio(filename) 

Below is the example of running GraphIOTensor inside tf.data.Dataset’s map:


import tensorflow as tf
import tensorflow_io as tfio

filename_dataset = tf.data.Dataset.from_tensor_slices(
    [audio_path, audio_path])
position_dataset = tf.data.Dataset.from_tensor_slices(
    [tf.constant(1000, tf.int64), tf.constant(2000, tf.int64)])

dataset = tf.data.Dataset.zip((filename_dataset, position_dataset))

# Note: @tf.function is actually not needed, as tf.data.Dataset
# will automatically wrap the `func` into a graph anyway.
# The following is purely for explanation purposes.
# Return: audio chunk from position:position+100, and the rate.
@tf.function
def func(filename, position):
  audio = tfio.IOTensor.graph(tf.int16).from_audio(filename)
  return audio[position:position+100], audio.rate

dataset = dataset.map(func)

for (data, rate) in dataset:
  print(data, rate)

Below is the example of running GraphIODataset inside tf.data.Dataset’s map, essentially a “dataset inside another dataset graph”:

import tensorflow as tf
import tensorflow_io as tfio

filename_dataset = tf.data.Dataset.from_tensor_slices(
    [audio_path, audio_path])
position_dataset = tf.data.Dataset.from_tensor_slices(
    [tf.constant(1000, tf.int64), tf.constant(2000, tf.int64)])

dataset = tf.data.Dataset.zip((filename_dataset, position_dataset))

# Note: @tf.function is actually not needed, as tf.data.Dataset
# will automatically wrap the `func` into a graph anyway.
# The following is purely for explanation purposes.
# Return: an embedded dataset (in an outer dataset) for position:position+100
@tf.function
def func(filename, position):
  audio_dataset = tfio.IODataset.graph(tf.int16).from_audio(filename)
  return audio_dataset.skip(position).take(100)

dataset = dataset.map(func)

# Notice audio_dataset in dataset:
for audio_dataset in dataset:
  for value in audio_dataset:
    print(value)

@yongtang reading your suggestions I feel that most users would probably like a decode_audio op that they can integrate in their tf.data pipeline. Is that possible?