keras: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4

Here’s the code I’ve written:


model.add(LSTM(150,
               input_shape=(64, 7, 339),
               return_sequences=False))
model.add(Dropout(0.2))

model.add(LSTM(
    200,
    return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(
    150,
    return_sequences=True))
model.add(Dropout(0.2))

model.add(Dense(
    output_dim=1))
model.add(Activation('sigmoid'))

start = time.time()
model.compile(loss='mse', optimizer='rmsprop')
print('compilation time : ', time.time() - start)

model.fit(
    trainX,
    trainY_Buy,
    batch_size=64,
    nb_epoch=10,
    verbose=1,
    validation_split=0.05)

the error i’m getting is this: ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4 on this line: model.add(LSTM(150, input_shape=(64, 7, 339), return_sequences=False))

my X shape is: (492, 7, 339) my Y shape is: (492,)

anyone have any ideas on what I’m doing wrong?

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 1
Comments: 36 (4 by maintainers)

Most upvoted comments

@ajanaliz . You may need to turn “return_sequences=True” in the first layer. Maybe that will solve it. I hope that works. Thanks.

+184

td2014 on Jul 22, 2017

@ajanaliz . I took a quick look, and I believe that you need to remove the leading “64” from the input shape of the LSTM layer --> input_shape=(64, 7, 339), --> input_shape=(7, 339). Keras’ convention is that the batch dimension (number of examples (not the same as timesteps)) is typically omitted in the input_shape arguments. The batching (number of examples per batch) is handled in the fit call. I hope that helps. Thanks.

+112

td2014 on Jul 22, 2017

@hadisaadat . Some suggestions that might help provide some direction for you. I would suggest breaking up your problem into several pieces to make sure you understanding the input/output dimensions of each layer (reviewing this section will help https://keras.io/layers/recurrent/). Also, you have “batch_size” as the first argument of the LSTM call:

model.add(LSTM(batch_size, input_shape=(Max_word_len, Char_embedding_Size), return_sequences=False))

The first argument (which is called “units” in the documentation) is the output dimensionality of that layer. In order words, you will have “units” number of LSTM cells at that layer. In Keras, the batch dimension is not typically specified in the model architecture.

It might be useful to just have a single LSTM layer that feeds into a single Dense layer (with number of units=1, so Dense(1), not Dense(62)) and look at the architecture you get from model.summary()).

Also, the “return_sequences=True” for each sample, will generate output from each LSTM cell for each timestep. This is typically fed into a second RNN layer, not into a regular Dense layer. Also, Keras should automatically infer the input data shape for every layer except the first.

If you are trying to processing characters -> words, and then words --> sentences, it might be easier to create two separate models (each one with a single LSTM layer) instead of directly trying to stack, because your batch definition is changing between layers.

Also, as far as the final layer goes, that will depend on what your target is? In other words, what do you want your model to output. Are you trying to classify the type of sentence? From your above architecture, it seems you have 62 different outputs/classes you are trying to model. Depending on what your target is, this should help you decide on the final output layer.

I hope this gives you a few ideas to help. Thanks.

td2014 on Sep 3, 2017

I have received multiple diffrent ValueErrors trying to solve this changed many parameters. It is a time series problemI have data from 60 shops from 215 items in 1034 days. I have splitted 973 days for train and 61 for test:
train_x = train_x.reshape((60, 973, 215))
test_x = test_x.reshape((60, 61, 215))
train_y = train_y.reshape((60, 973, 215))
test_y = test_y.reshape((60, 61, 215))
My model:
model = Sequential()
model.add(LSTM(100, input_shape=(train_x.shape[1], train_x.shape[2]), return_sequences='true'))
model.add(Dense(215))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y, epochs=10,
                    validation_data=(test_x, test_y), verbose=2, shuffle=False)
ValueError: Error when checking input: expected lstm_1_input to have shape (973, 215) but got array with shape (61, 215)

Before and after an LSTM layer you need an input and output layer

model = Sequential() model.add(Dense(215, input_shape=(train_x.shape[1], train_x.shape[2]))) model.add(LSTM(100, return_sequences=‘true’)) model.add(Dense(1, activation=‘softmax’))

fasecity on Jan 17, 2019

I tried:
model.add(Dense(215, input_shape=(train_x.shape[1], train_x.shape[2])))
model.add(LSTM(100, return_sequences='true'))
model.add(Dense(1, activation='softmax'))
and got: ValueError: Error when checking target: expected dense_2 to have shape (973, 1) but got array with shape (973, 215)

Then tried
model.add(Dense(215, input_shape=(train_x.shape[1], train_x.shape[2])))
model.add(LSTM(100, return_sequences='true'))
model.add(Dense(215, activation='softmax'))
and got ValueError: Error when checking input: expected dense_1_input to have shape (973, 215) but got array with shape (61, 215)

My goal is to have as output predictions over 215 items for 60 shops in 61 days something like 3660 x 215

I would try increasing the network:

model.add(Dense(1024, input_shape=(train_x.shape[1], train_x.shape[2]))) model.add(LSTM(256, return_sequences=‘true’)) model.add(Dense(215, activation=‘softmax’))

fasecity on Jan 18, 2019

@hadisaadat . Here is a bit more information which might help, based on your info above:

in the first lstm layer at charachter level I want the layer output only at the end of sequence which is the end on the word(let assume I have a padded ) so I set return_sequences=False to force it not to output for each input character only at the end, on the other hand I want the second layer receive input for each word and out put for each word as well, so return_sequences=True is for this layer; some questions raises here :

what is really relation between the batch_size here with return_sequences=False/True ?

The basic layout is the following:

Batch_sample_1: timestep1, timestep2, …, timestepN Batch_sample_2: timestep1, timestep2, …, timestepN … Batch_sample_BatchSize: timestep1, timestep2, …, timestepN

When you set return_sequences=True, then for each batch_sample, you get outputs for each time step (And this will be one output for each LSTM cell in that layer).
When you set return_sequences=False, then for each batch_sample, you get the output for timestepN only.
In the second layer, since you are doing input and output for each word (which is from layer1), return_sequences=False is what you want I think (you only want the output from each input word, not the character that forms the word. For each sequence of timestep1…timestepN, you only have one word predicted).

Some more ideas based on using Masking and Merge Layers that might suggest some direction:

from keras.layers import LSTM, Input, Masking, multiply
from keras.models import Model

#
# Create input sequences
#
numTimesteps=20
slopeArray1=np.linspace(0, 10, num=numTimesteps)
slopeArray1 = np.expand_dims(slopeArray1, axis=0)
slopeArray1 = np.expand_dims(slopeArray1, axis=2)

slopeArray2=np.linspace(0, 15, num=numTimesteps)
slopeArray2 = np.expand_dims(slopeArray2, axis=0)
slopeArray2 = np.expand_dims(slopeArray2, axis=2)
maskArray=np.zeros((1,numTimesteps,1))
maskArray[0,numTimesteps-1]=1

X_train = np.concatenate((slopeArray1, slopeArray2))
X_mask = np.concatenate((maskArray, maskArray))

# preparing y_train
y_train = []
y_train = np.array([2*slopeArray1[0,19]-slopeArray1[0,18],
                    2*slopeArray2[0,19]-slopeArray2[0,18]]) # make target one delta higher

#
# Create model
#

inputs = Input(name='Input1', batch_shape=(1,numTimesteps,1))
X_mask_input = Input(name='Input2', batch_shape=(1,numTimesteps,1))
x = LSTM(units=1, name='LSTM1', return_sequences=True)(inputs)
x = multiply([x, X_mask_input])
x = Masking(mask_value=0.0)(x)
pred = LSTM(units=1, name='LSTM2', return_sequences=False, stateful=True)(x)
model = Model(inputs=[inputs, X_mask_input], outputs=pred)
model.compile(loss='mse', optimizer='sgd', metrics=['mse'])
print(model.summary())

#
# Train
# 
model.fit([X_train, X_mask], y_train, epochs=200, batch_size=1)

The idea is that if you can set a value (say 0.0) to be an “ignore” using the mask, then the second LSTM will only process on the final output, I think. More details here:https://keras.io/layers/core/

I hope this helps. Thanks.

td2014 on Sep 6, 2017

Hi, I have the same bug as you. My X shape is (24443, 124, 30), y shape is (24443, 124). May be it’s the shape of y that causes the error for me. May I know the type of your Y?

junfeng-chen on Aug 2, 2017

@td2014 nope, that way my error is: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=2

ajanaliz on Jul 22, 2017

same thing happens when I wrote the following for the first LSTM layer:

model.add(LSTM(150,
               input_shape=trainX.shape,
               return_sequences=False))

ajanaliz on Jul 22, 2017