deeplake: [BUG] Having issues with training a dataset using tensorflow

πŸ›πŸ› Bug Report

βš—οΈ Current Behavior

I am trying to train an autoencoder model, but running into issues getting it working with TensorFlow

Input Code

  • REPL or Repo link if applicable:
foo = view.tensorflow(tensors=['input', 'output'])

history = autoencoder.fit(
    x=foo,
    epochs=epochs,
    batch_size=128,
    verbose=0,
    callbacks=callbacks
)

image

βš™οΈ Environment

  • Python version(s):
    • 3.9.10
  • OS: Windows 10
  • IDE: Jupyter Notebook
  • Packages: [ Tensorflow==2.6.0 - latest]

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

@v2thegreat It isnt the default behavior as we dont know which tensors are inputs and outputs, and some models can have multiple inputs and outputs, and some have just inputs - so the user have to map them explicitly.

As for performance, currently the tensorflow integration is a wrapper around a python for loop and is not optimized (unlike the pytorch integration). How did you implement pre-fetching?

Hey, sorry for the late response. I was refering to tf.Dataset.prefetch(). I can see why the implementation would be so slow if that’s the underlying implementation.

Thanks for all the help! I love your stuff and am excited to see if I can get a better performance from pytorch

@v2thegreat Thanks for the minimal repro script, will look into it.

Alright! I got the minimal example here:

import deeplake
import keras
from keras import layers

@deeplake.compute
def foo(sample_in, sample_out):
    sample_out.append({'input': sample_in.images, 'output': sample_in.images})
    return sample_out

dataset_mnist = deeplake.load('hub://activeloop/mnist-train') # get mnist
ds = deeplake.empty('./mnist_sample', overwrite=True)

# Fill the empty dataset
with ds:
    ds.create_tensor_like('input', dataset_mnist.images)
    ds.create_tensor_like('output', dataset_mnist.images)
    foo().eval(dataset_mnist, ds, num_workers=16, scheduler='processed')

# Define Autoencoder

input_img = keras.Input(shape=(28, 28, 1))

x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = keras.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Load the data into TensorFlow

training_ds = ds.tensorflow(tensors=['input', 'output'])

# train the model, this is where things break

autoencoder.fit(
    x=training_ds,
    epochs=50,
    batch_size=128,
    shuffle=True
)

Please let me know if you have any questions!

I don’t think that it’s an issue with the name. I call my images input while I was testing deeplake, and the model is calling its first layer input because I’m using an input layer in tensorflow.

I tried what you provided and I got the same issue. I’ll write the minimal example in a few