keras: LSTM many to many mapping problem

I want to implement the forth architecture from the left (first many to many):

I my case the length of input and outputs aren’t equal (I mean that number of blue and red units aren’t equal)! I have n samples to train the NN then the shape of input is nn_prev1 and the out shape is nn_nxt1. My model is as follows:

batch_size = 50 
n_prev = 100
n_nxt = 5
print("creating model...")
unit_number = n_nxt
model = Sequential()
model.add(LSTM(unit_number,
           batch_input_shape=(batch_size, n_prev, 1),
           forget_bias_init='one',
           return_sequences=True,
           stateful=True))
model.add(Dropout(0.2))
model.add(LSTM(unit_number,
           batch_input_shape=(batch_size, n_prev, 1),
           forget_bias_init='one',
           return_sequences=False,
           stateful=True))
model.add(Dropout(0.2))
model.add(Dense(n_nxt))
model.compile(loss='mse', optimizer='rmsprop')


print('Training')
numIteration = len(X_train)/batch_size
for i in range(epochs):
    print('Epoch', i, '/', epochs)
    for j in range(numIteration):
        print('Batch', j, '/',numIteration,'Epoch', i)
        model.train_on_batch(X_train[j*batch_size:j*batch_size+batch_size,], y_train[j*batch_size:j*batch_size+batch_size,])

    model.reset_states()


print('Predicting')
predicted_output = model.predict(X_test, batch_size=batch_size)

But I think I have implemented the third one (because return_sequences=False in the second LSTM layer, according to this article: https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent ). I think I have one output from the second LSTM layer like third architecture (from the left) and then it is extended to five outputs by Dense layer. When I set return_sequences=True in the second LSTM layer I get the following error:

Traceback (most recent call last):
 File "/home/mina/Documents/research/one cores (LSTM) seq2seq  one core vector2vector  train on batch statefull 15/KerasLSTM.py", line 259, in <module>
main()
 File "/home/mina/Documents/research/one cores (LSTM) seq2seq  one core vector2vector  train on batch statefull 15/KerasLSTM.py", line 200, in main
   model.add(Dense(n_nxt))
 File "/usr/local/lib/python2.7/dist-packages/keras/layers/containers.py", line 68, in add
self.layers[-1].set_previous(self.layers[-2])
 File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 85, in set_previous
str(layer.output_shape))
AssertionError: Incompatible shapes: layer expected input with ndim=2 but previous layer has output_shape (1, 100, 5)

Can I omit dense layer? As I understood If I don’t set return_sequences=True I wil have the third architecture, is it right? Is there any problem that the length of the input is different from the output length?

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 1
Comments: 24 (4 by maintainers)

Most upvoted comments

So your current model ends with model.add(TimeDistributed(Dense(n_nxt))) meaning that you apply a dense operation to your input data and return n_nxt number of outputs per input timestep. This is not what your target data is right? Read up on encoder-decoder networks. Here’s an example stolen from @EderSantana from https://github.com/fchollet/keras/issues/562.

model = Sequential()
model.add(LSTM(inp_dim, out_dim, return_sequences=False))  # Encoder
model.add(RepeatVector(sequence_length))
model.add(LSTM(out_dim, inp_dim), return_sequences=True)  # Decoder

@fluency03 KTH represent! 😄

carlthome on Apr 21, 2016

You’re on the right track. Set return_sequences=True then make sure to wrap Dense with a TimeDistributed “wrapper” layer. Its Keras’ way of applying a layer to several time steps.

Different lengths of input and output is not a problem as long as all your input samples are of the same length, and all of the output samples are of the same length. If not, you should pad sequences with a dummy value so they’re all of equal length (you could still treat inputs and targets separately) and add Masking layers.

carlthome on Apr 19, 2016

@mininaNik this addition_rnn is indeed a good example and I think can solve your problem.

Thanks @carlthome ! Cheers, KTH ! 😃

fluency03 on Apr 21, 2016

Or, have you considered custom layer here for reshaping the data? Is that possible for solving this problem @fchollet ?

fluency03 on Apr 21, 2016